This sample describes how to use the cuSPARSE and cuBLAS libraries to implement the Incomplete-Cholesky preconditioned iterative method CG.
The solution of large sparse linear systems is an important problem in computational mechanics, atmospheric modeling, geophysics, biology, circuit simulation, and many other applications in the field of computational science and engineering. In general, these linear systems can be solved using direct or preconditioned iterative methods. Although the direct methods are often more reliable, they usually have large memory requirements and do not scale well on massively parallel computer platforms.
The iterative methods are more amenable to parallelism and therefore can be used to solve larger problems. Currently, the most popular iterative schemes belong to the Krylov subspace family of methods. They include Bi-Conjugate Gradient Stabilized (BiCGStab) and Conjugate Gradient (CG) iterative methods for non-symmetric and symmetric positive definite (s.p.d.) linear systems. We describe the CG method in more detail in the next section.
In practice, we typically use a variety of preconditioning techniques to improve the convergence of the iterative methods. In this sample, we focus on the Cholesky preconditioning which is one of the most popular of these preconditioning techniques. It computes an incomplete factorization of the coefficient matrix and requires a solution of lower system in every iteration of the iterative method.
In order to implement the preconditioned CG, we use the sparse matrix-vector multiplication and the sparse triangular solve implemented in the cuSPARSE library. We point out that the parallelism available in these iterative methods depends highly on the sparsity pattern of the coefficient matrix at hand.
Notice that in every iteration of the incomplete-Cholesky preconditioned CG iterative method, we need to perform one sparse matrix-vector multiplication and two triangular solves. The corresponding CG code using the cuSPARSE and cuBLAS libraries in the C programming language is shown below.
the pdf version is also available here
the code contains the line references to the above algorithm
-
Command line
gcc -I<cuda_toolkit_path>/include cg_example.c -o cg_example -lcudart -lcusparse -lcublas
-
Linux
make
-
Windows/Linux
mkdir build cd build cmake .. make
On Windows, instead of running the last build step, open the Visual Studio Solution that was created and build.
- Supported SM Architectures: SM 3.5, SM 3.7, SM 5.0, SM 5.2, SM 5.3, SM 6.0, SM 6.1, SM 6.2, SM 7.0, SM 7.2, SM 7.5, SM 8.0, SM 8.6, SM 8.9, SM 9.0
- Supported OSes: Linux, Windows, QNX, Android
- Supported CPU Architectures: x86_64, ppc64le, arm64
- Supported Compilers: gcc, clang, Intel icc, IBM xlc, Microsoft msvc, Nvidia HPC SDK nvc
- Language:
C99
- CUDA 11.3 toolkit (or above) and compatible driver (see CUDA Driver Release Notes).
- CMake 3.9 or above on Windows
Creating 5-point time-dependent diffusion matrix.
grid size: 700 x 700
matrix rows: 490000
matrix cols: 490000
nnz: 2447200
Testing CG
CG loop:
Initial Residual: Norm 4.633034e+01' threshold 4.633034e-07
Iteration = 0; Error Norm = 4.633034e+01
Iteration = 1; Error Norm = 5.843251e+01
Iteration = 2; Error Norm = 2.294159e+01
Iteration = 3; Error Norm = 3.503917e+01
Iteration = 4; Error Norm = 1.641601e+01
Iteration = 5; Error Norm = 2.489376e+01
...
Iteration = 83; Error Norm = 3.124845e-06
Iteration = 84; Error Norm = 1.739886e-06
Iteration = 85; Error Norm = 1.073478e-06
Iteration = 86; Error Norm = 1.482683e-06
Iteration = 87; Error Norm = 5.830292e-07
Iteration = 88; Error Norm = 9.759474e-07
Check Solution
Final error norm = 3.024808e-07