This projects aims to implement Breadth First Search Algorithm on CUDA which would outperform simple sequential implementation.
Nvidia paper: https://research.nvidia.com/publication/scalable-gpu-graph-traversal (Merrill, Garland, Grimshaw)
- Create Graph class and test it
- Implement sequential BFS and test it
- Implement (simple) parallel BFS with CUDA and test it
- Implement Merill parallel BFS with CUDA and test it
- Performance comparison and conclusions
Simple, sequential Breadth First Search has O(|V| + |E|) complexity - we visit every vertex exactly once and every edge at most once. Therefore, it would be desirable for our parallel algorithm to have the same asymptotic complexity.
At first, we will consider quadratic, but sometimes quite effective algorithm. After that, we will consider simple parallelization strategy that has the above complexity. In each BFS level it simply traverses all vertexes in the queue in parallel and creates a new queue. New vertices are added using atomicAdd operation on the position in the output queue in each iteration.
Then, we will try to imlement more sophisticated algorithm. As described in Merill et. al our strategy is following:
- expand adjacent neighbors in parallel
- implement out-of-vore edge and vertex frontiers
- use local prefix sum in place of local atomic operations for determining enqueue offsets
- use a best-effort bitmask for efficient neighbor filtering
- gtest (https://github.com/google/googletest) for testing
- Compile the program with
./compile
- Run the program inputing data source. For example
./bfs_main < data/4elt.graph
The source code has options to read graphs provided both as AdjacencyList and list of edges and this is specified by Format argument in a constructor.
- D. Merill, M. Garland, A. Grimshaw "Scalable GPU Graph Traversal": https://research.nvidia.com/publication/scalable-gpu-graph-traversal
- Github user rafalk342 (https://github.com/rafalk342/bfs-cuda) whose repo helped me get started with implementation of my first code running on GPU with cuda.