Skip to content
/ CVR Public
forked from puckbee/CVR

Parallelized and vectorized SpMV on Intel Xeon Phi (Knights Landing, AVX512, KNL)

License

Notifications You must be signed in to change notification settings

bripage/CVR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

41 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CVR

Parallelized and vectorized SpMV on Intel Xeon Phi (Knights Landing).
This is the artifact of our CGO'2018 paper [ CVR: Efficient SpMV Vectorization on X86 Processors ].

Build

CVR can be built simply with 'make', while the resulted binariy file is 'spmv.cvr'.

Step: make       

Dataset Preparation and Execution

Our implementation of CVR supports sparse matrices with matrix market format, which is one of the default formats in SuiteSparse Matrix Collection. Most of the data sets used in our paper can be found in either of these two collections:

  1. SuiteSparse Matrix Collection (formerly the University of Florida Sparse Matrix Collection).
  2. Stanford Large Network Dataset Collection (SNAP).

Here, we use web-Google for example to show how to use CVR:

step 1: ./run_sample.sh

The CVR accepts three parameters: file path; Number of Threads; Number of Iterations.
In run_sample.sh, there is a command like this:

numactl --membind=1 ./spmv.cvr [filepath] [numThreads] [numIterations]

Sample: numactl --membind=1 ./spmv.cvr dataset/web-Google.mtx 68 1000

CVR will print two times in seconds: [Pre-processing time] and [SpMV Execution time].
[Pre-processing time] is the time of converting a sparse matrix with CSR format to CVR format.
[SpMV Execution time] is the average time of running 1000 iterations of SpMV with CVR format. Note that 1000 can be changed by changing "Number of Iterations"

Compare CVR with Other Formats/Solutions

MKL,CSR-I and ESB are dependent on MKL.
Please make sure that MKL is already installed and the environment variable $MKL_ROOT is already set.

We tried various threads numbers and parameters for each format/solution and choose the configuration that achieves the best performance.
You can try to setup different threads numbers in run_comparison.sh, we will elaborate how to do this later.
But if you just want to have a reproduce the experiment results of web-Google, these three steps can definitely meet your need.

Step 1: cd ./solutions_for_comparison

Step 2: ./build.sh        // build all formats/ solutions

Step 3: ./run_comparison.sh ../dataset/web-Google.mtx                           // run all formats/solutions 
(a)     ./run_comparison.sh ../dataset/web-Google.mtx  | grep 'Pre-processing'  // get the Pre-processing time. 
(b)     ./run_comparison.sh ../dataset/web-Google.mtx  | grep 'SpMV Execution'  // get the SpMV execution time. 
(c)     ./run_comparison.sh ../dataset/web-Google.mtx  | grep 'Throughput'      // get the Throughput(GFlops).

We will elaborate how to use each format/solution, so that you can change the configuration to fullfill your own requirements.

CSR5

numactl --membind=1 ./bin/spmv.csr5 [filepath] [numThreads] [numIterations]

Sample: numactl --membind=1 ./spmv.csr5 ../dataset/web-Google.mtx 204 1000

VHCC

VHCC has many parameters. Since the width and height of blocks is pretty fixed to be (512,8192), we only provide the number of panels here.

numactl --membind=1 ./bin/spmv.vhcc [filepath] [numThreads] [numIterations] [numPanels]

Sample: numactl --membind=1 ./spmv.vhcc ../dataset/web-Google.mtx 272 1000 1

CSR-I

numactl --membind=1 ./bin/spmv.csr [filepath] [numThreads] [numIterations]

Sample: numactl --membind=1 ./spmv.csr ../dataset/web-Google.mtx 272 1000

ESB

ESB has diffent schedule policies: static and dynamic. 1 for static; 2 for dynamic; 3 for both two.

numactl --membind=1 ./bin/spmv.esb [filepath] [numThreads] [numIterations] [schedule_policy]

Sample: numactl --membind=1 ./spmv.esb ../dataset/web-Google.mtx 272 1000 3

MKL

numactl --membind=1 ./bin/spmv.mkl [filepath] [numThreads] [numIterations]

Sample: numactl --membind=1 ./spmv.mkl ../dataset/web-Google.mtx 272 1000

Cache Performance Profiling (Additional)

Dependency: Vtune

Step 1: cd ./solutions_for_comparison
	
Step 2: ./build.sh                 // If it has not been built yet

Step 3: ./run_locality.sh [filepath][nT_CVR][nT_CSR5][nT_VHCC][nPanels][nT_CSRI][nT_ESB][schedule_ESB][nT_MKL]
        ./run_locality.sh ../dataset/web-Google.mtx 68 204 272 1 272 272 1 272

Note that 'nT' stands for numThreads, while 'nPanels' stands for numPanels of VHCC.

Notes

We only modified the source code of CSR5 and VHCC to format the output messages.
The source code of CSR5[ICS'15], please refer to (https://github.com/bhSPARSE/Benchmark_SpMV_using_CSR5)
The source code of VHCC[CGO'15], please refer to (https://github.com/vhccspmv/vhcc)

About

Parallelized and vectorized SpMV on Intel Xeon Phi (Knights Landing, AVX512, KNL)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C++ 81.1%
  • C 13.5%
  • Shell 3.7%
  • Makefile 1.7%