Parallelized and vectorized SpMV on Intel Xeon Phi (Knights Landing). This is the artifact of our CGO'2018 paper [ CVR: Efficient SpMV Vectorization on X86 Processors ].
CVR can be built simply with 'make', while the resulted binariy file is 'spmv.cvr'.
Step: make
Our implementation of CVR supports sparse matrices with matrix market format, which is one of the default formats in SuiteSparse Matrix Collection. Most of the data sets used in our paper can be found in either of these two collections:
1) [SuiteSparse Matrix Collection](https://sparse.tamu.edu) (formerly the University of Florida Sparse Matrix Collection).
2) [Stanford Large Network Dataset Collection](http://snap.stanford.edu/data/) (SNAP).
Here, we use web-Google for example to show how to use CVR:
step 1: ./run_sample.sh
The CVR accepts three parameters: file path; Number of Threads; Number of Iterations.
In run_sample.sh, there is a command like this:
numactl --membind=1 ./spmv.cvr dataset/web-Google.mtx 68 1000
It means CVR reads a sparse matrix from "web-Google/web-Google.mtx" and execute SpMV with 272 threads for 1000 iterations.
CVR will print two times in seconds: [Pre-processing time] and [SpMV Execution time].
[Pre-processing time] is the time of converting a sparse matrix with CSR format to CVR format.
[SpMV Execution time] is the average time of running 1000 iterations of SpMV with CVR format. Note that 1000 can be changed by changing "Number of Iterations"
MKL,CSR-I and ESB are dependent on MKL.
Please make sure that MKL is already installed and the environment variable $MKL_ROOT is already set.
We tried various threads numbers and parameters for each format/solution and choose the one that achieves the best performance.
Step 1: cd ./solutions_for_comparison
Step 2: ./build.sh // build all formats/ solutions
Step 3: ./run_comparison.sh // run all formats/solutions
(a) ./run_comparison.sh | grep 'Pre-processing' // get the Pre-processing time.
(b) ./run_comparison.sh | grep 'SpMV Execution' // get the SpMV execution time.
(c) ./run_comparison.sh | grep 'Throughput' // get the Throughput(GFlops).
We will elaborate how to use each format/solution, so that you can change the parameters to fullfill your own requirement.
numactl --membind=1 ./spmv.csr5 [numThreads] [numIterations]
Sample: numactl --membind=1 ./spmv.csr5 204 1000
VHCC has many parameters. Since the width and height of blocks is pretty fixed to be (512,8192), we only provide the number of panels here. numactl --membind=1 ./spmv.vhcc [numThreads] [numIterations] [numPanels]
Sample: numactl --membind=1 ./spmv.vhcc 272 1000 1
numactl --membind=1 ./spmv.csr [numThreads] [numIterations]
Sample: numactl --membind=1 ./spmv.csr 272 1000
ESB has diffent schedule policies: static and dynamic. 1 for static; 2 for dynamic; 3 for both two. numactl --membind=1 ./spmv.esb [numThreads] [numIterations] [schedule_policy]
Sample: numactl --membind=1 ./spmv.esb 272 1000 3
numactl --membind=1 ./spmv.mkl [numThreads] [numIterations]
Sample: numactl --membind=1 ./spmv.mkl 272 1000
Dependency: Vtune
Step 1: cd ./solutions_for_comparison
Step 2: ./build.sh // If it has not been built yet
Step 3: ./run_locality.sh