Skip to content
/ CVR Public
forked from puckbee/CVR

Parallelized and vectorized SpMV on Intel Xeon Phi (Knights Landing, AVX512, KNL)

License

Notifications You must be signed in to change notification settings

bripage/CVR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CVR

Parallelized and vectorized SpMV on Intel Xeon Phi (Knights Landing). This is the artifact of our CGO'2018 paper [ CVR: Efficient SpMV Vectorization on X86 Processors ].

Build

CVR can be built simply with 'make', while the resulted binariy file is 'spmv.cvr'.

	Step: make       

Data set Preparation and Execution

Our implementation of CVR supports sparse matrices with matrix market format, which is one of the default formats in SuiteSparse Matrix Collection. Most of the data sets used in our paper can be found in either of these two collections:

	1) [SuiteSparse Matrix Collection](https://sparse.tamu.edu) (formerly the University of Florida Sparse Matrix Collection).
	2) [Stanford Large Network Dataset Collection](http://snap.stanford.edu/data/) (SNAP).

Here, we use web-Google for example to show how to use CVR:

	step 1: ./run_sample.sh

The CVR accepts three parameters: file path; Number of Threads; Number of Iterations.
In run_sample.sh, there is a command like this:

	numactl --membind=1 ./spmv.cvr dataset/web-Google.mtx 68 1000

It means CVR reads a sparse matrix from "web-Google/web-Google.mtx" and execute SpMV with 272 threads for 1000 iterations.

CVR will print two times in seconds: [Pre-processing time] and [SpMV Execution time].
[Pre-processing time] is the time of converting a sparse matrix with CSR format to CVR format.
[SpMV Execution time] is the average time of running 1000 iterations of SpMV with CVR format. Note that 1000 can be changed by changing "Number of Iterations"

Compare CVR with other formats/solutions

MKL,CSR-I and ESB are dependent on MKL.
Please make sure that MKL is already installed and the environment variable $MKL_ROOT is already set.

We tried various threads numbers and parameters for each format/solution and choose the one that achieves the best performance.

	Step 1: cd ./solutions_for_comparison

	Step 2: ./build.sh        // build all formats/ solutions

	Step 3: ./run_comparison.sh     // run all formats/solutions 
	(a)     ./run_comparison.sh | grep 'Pre-processing'      // get the Pre-processing time. 
	(b)     ./run_comparison.sh | grep 'SpMV Execution'      // get the SpMV execution time. 
	(c)     ./run_comparison.sh | grep 'Throughput'          // get the Throughput(GFlops).

We will elaborate how to use each format/solution, so that you can change the parameters to fullfill your own requirement.

CSR5

	numactl --membind=1 ./spmv.csr5 [numThreads] [numIterations]

	Sample: numactl --membind=1 ./spmv.csr5 204 1000

VHCC

VHCC has many parameters. Since the width and height of blocks is pretty fixed to be (512,8192), we only provide the number of panels here. numactl --membind=1 ./spmv.vhcc [numThreads] [numIterations] [numPanels]

	Sample: numactl --membind=1 ./spmv.vhcc 272 1000 1

CSR-I

	numactl --membind=1 ./spmv.csr [numThreads] [numIterations]

	Sample: numactl --membind=1 ./spmv.csr 272 1000

ESB

ESB has diffent schedule policies: static and dynamic. 1 for static; 2 for dynamic; 3 for both two. numactl --membind=1 ./spmv.esb [numThreads] [numIterations] [schedule_policy]

	Sample: numactl --membind=1 ./spmv.esb 272 1000 3

MKL

	numactl --membind=1 ./spmv.mkl [numThreads] [numIterations]

	Sample: numactl --membind=1 ./spmv.mkl 272 1000

Cache Performance Profiling (Additional)

Dependency: Vtune

	Step 1: cd ./solutions_for_comparison
	
	Step 2: ./build.sh                 // If it has not been built yet

	Step 3: ./run_locality.sh

About

Parallelized and vectorized SpMV on Intel Xeon Phi (Knights Landing, AVX512, KNL)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C++ 81.1%
  • C 13.5%
  • Shell 3.7%
  • Makefile 1.7%