This code demonstrates usage of cuSOLVER gesv functions introduced in CUDA 10.2 that provides interface to linear system solver with multiple right hand sides using factorization of initial system in specified precision. cuSOLVER provides two sets of APIs for Iterative Refinement Solver functionality - one is similar to LAPACK's GESV and another 'expert' API which gives more configurable options that the user can set through solver parameters. Examples perform following steps for both APIs:
- Generating random diagonal dominant matrix of provided type on the host
- Generating random right hand side vectors for the linear system on the host
- Initializing required CUDA and cuSOLVER miscellaneous variables
- Allocating required device memory for input data and workbuffer for the solver
- Copying input data to the device
- Solving the system of equations
- Checking return errors and information
- Releasing used resources
Linear Solver, Factorization, Mixed Precision, Tensor Cores
SM 7.0
SM 7.2
SM 7.5
SM 8.0
SM 8.6
Linux
Windows
x86_64
ppc64le
arm64-sbsa
- A Linux/Windows system with recent NVIDIA drivers.
- CMake version 3.18 minimum
- Minimum CUDA 10.2 toolkit is required.
$ mkdir build
$ cd build
$ cmake ..
$ make
Make sure that CMake finds expected CUDA Toolkit. If that is not the case you can add argument -DCMAKE_CUDA_COMPILER=/path/to/cuda-10.2/bin/nvcc
to cmake command.
$ mkdir build
$ cd build
$ cmake -DCMAKE_GENERATOR_PLATFORM=x64 ..
$ Open cusolver_examples.sln project in Visual Studio and build
Produced are two binaries - one uses expert API for gesv() function, and another uses lapack style API, with interface similar to LAPACK GESV function.
Usage:
$ ./cusolver_irs_lapack
Sample example output:
Generating matrix A on host...
make A diagonal dominant...
Generating matrix B on host...
Generating matrix X on host...
Initializing CUDA...
Allocating memory on device...
Workspace is 12591744 bytes
Solving matrix on device...
Solve info is: 0, iter is: 2
Releasing resources...
Done!
Usage:
$ ./cusolver_irs_expert
Sample example output:
Generating matrix A on host...
make A diagonal dominant...
Generating matrix B on host...
Generating matrix X on host...
Initializing CUDA...
Setting up gesv() parameters...
Allocating memory on device...
Workspace is 12591744 bytes
Solving matrix on device...
Solve info is: 0, iter is: 2
Solved matrix 1024x1024 with 1 right hand sides in 19.6782ms
Releasing resources...
Done!