Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiplication with a symmetrically wrapped CuSparseMatrix returns wrong answer #2572

Open
mipals opened this issue Nov 29, 2024 · 6 comments
Labels
bug Something isn't working

Comments

@mipals
Copy link

mipals commented Nov 29, 2024

Describe the bug

Multiplication with a symmetrically wrapped CuSparseMatrix outputs the multiplication with the CuSparseMatrix itself and not the symmetric matrix.

To reproduce

The Minimal Working Example (MWE) for this bug:

Id = [1, 2, 3, 4, 1, 4, 5, 2, 4, 6, 7, 3, 4, 8, 4, 5, 8, 9, 4, 5, 10]
Jd =  [1, 2, 3, 4, 5, 5, 5, 6, 6, 6, 7, 8, 8, 8, 9, 9, 9, 9, 10, 10, 10]
V = [1.0, 1.0, 1.0, 1.0, 2.0, 2.0, 1.0, 2.0, 2.0, 1.0, 1.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 1.0, 2.0, 2.0, 1.0]
T = sparse(Id,Jd,V)
v = ones(10)
Symmetric(T)*v - T*v # Non-zero
# However equivalent on the GPU does not work the same way
Tgpu = CuSparseMatrixCSR(T)
vgpu = CuVector(v)
Symmetric(Tgpu)*vgpu - Tgpu*vgpu # 0
Manifest.toml

Paste your Manifest.toml here, or accurately describe which version of CUDA.jl and its dependencies (GPUArrays.jl, GPUCompiler.jl, LLVM.jl) you are using.

Expected behavior

I expected the behaviour to be similar to the CPU version.

Version info

Details on Julia:

Julia Version 1.11.0
Commit 501a4f25c2b (2024-10-07 11:40 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 24 × Intel(R) Xeon(R) Gold 6126 CPU @ 2.60GHz
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, skylake-avx512)
Threads: 24 default, 0 interactive, 12 GC (on 24 virtual cores)
Environment:
  JULIA_EXCLUSIVE = 0
  LD_LIBRARY_PATH = /lsf/10.1/linux3.10-glibc2.17-x86_64/lib
  JULIA_NUM_THREADS = 1

Details on CUDA:

CUDA runtime 12.6, artifact installation
CUDA driver 12.7
NVIDIA driver 565.57.1

CUDA libraries: 
- CUBLAS: 12.6.4
- CURAND: 10.3.7
- CUFFT: 11.3.0
- CUSOLVER: 11.7.1
- CUSPARSE: 12.5.4
- CUPTI: 2024.3.2 (API 24.0.0)
- NVML: 12.0.0+565.57.1

Julia packages: 
- CUDA: 5.5.2
- CUDA_Driver_jll: 0.10.4+0
- CUDA_Runtime_jll: 0.15.5+0

Toolchain:
- Julia: 1.11.0
- LLVM: 16.0.6

1 device:
  0: Tesla V100-PCIE-16GB (sm_70, 8.250 GiB / 16.000 GiB available)
@mipals mipals added the bug Something isn't working label Nov 29, 2024
@amontoison
Copy link
Member

We don't have GPU routines specialized for symmetric matrices. The sparse matrix is just unwrapped from Symmetric or Hermitian.

@mipals
Copy link
Author

mipals commented Nov 29, 2024

Yeah, I followed the multiplication and found that a _unwrap(A) is what is causing this. I might be wrong, but I do not think the current behaviour is user-friendly.

I've looked a little into the cuSPARSE documentation, and it does seem like a symmetric type is available and even part of CUSPARSE through CUSPARSE.CUSPARSE_MATRIX_TYPE_SYMMETRIC? In the documentation they do write that it is not very efficient to use the symmetry in mat-vecs but in the case of just using the mat-vec for iterative refinement after having solved a linear system using cuDSS (where only a single side of the matrix is required) it is probably ok/negligible?

@amontoison
Copy link
Member

amontoison commented Nov 29, 2024

I agree that it's not optimal but it is not supported by any GPU library (to the best of my knowledge).

I never wanted to add methods with Symmetric because of that for CUSPARSE, but some users wanted this behavior instead of an error.
Also Symmetric is not officially a specific storage, it's only used for dispatch in many cases. If we store both triangles, we still need to decide between uplo=:L and uplo=:U.

I checked your type in the CUDA documentation (https://docs.nvidia.com/cuda/cusparse/index.html?highlight=CUSPARSE_MATRIX_TYPE_SYMMETRIC#cusparsematrixtype-t), it part of the legacy API that we removed since a long time here.
I will investigate if we can do that with the generic API but I'm not optimistic because even with the legacy API, this format was not supported by many routines.

@amontoison
Copy link
Member

@mipals I confirm that it is not supported by the new API and it was also not supported for sparse matrix-vector and matrix-matrix products with the legacy API:

@mipals
Copy link
Author

mipals commented Dec 1, 2024

Ahh, I did not realize that it was only part of the legacy API - and that it was not even supported then. Thats a shame. Thanks for investigating!

I guess I can do it "manually" then as the mat-vecs for both the matrix the transpose is supported. In this case I would need to remove one of the diagonals, however it seems as the standard Diagonal(Tgpu) throws an ERROR: Scalar indexing is disallowed. . What is the best way to get around this?

@amontoison
Copy link
Member

We never implemented a kernel for Diagonal(Tgpu) if Tgpu is a sparse matrix on GPU. We should add one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants