Skip to content
forked from mp3guy/ICPCUDA

Super fast implementation of ICP in CUDA for compute capable devices 3.0 or higher

Notifications You must be signed in to change notification settings

Jinqiang/ICPCUDA

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 

Repository files navigation

ICPCUDA

Super fast implementation of ICP in CUDA for compute capable devices 3.0 or higher

Requires CUDA, Boost, Eigen and OpenCV. I've built it to take in raw TUM RGB-D datasets to do frame-to-frame dense ICP as an example application.

The code is a mishmash of my own stuff written from scratch, plus a bunch of random classes/types taken from PCL (on which the code does not depend :D). The slower version of ICP I compare to is the exact same version in PCL. In my benchmarks I have also found it to be faster than the SLAMBench implementation and hence the KFusion implementation. I have not tested against InfiniTAM.

The particular version of ICP implemented is the one introduced by KinectFusion. This means a three level coarse-to-fine registration pyramid, from 160x120 to 320x240 and finally 640x480 image sizes, with 4, 5 and 10 iterations per level respectively.

The fast ICP implementation, which is my own, essentially exploits the shlf instruction added to compute capable 3.0 devices that removes the need for warp level synchronisation when exchanging values, see more here.

Run like;

./ICP ~/Desktop/rgbd_dataset_freiburg1_desk/

Where ~/Desktop/rgbd_dataset_freiburg1_desk/ contains the association.txt file with rgb first and depth second, for more information see here.

The code will run both methods for ICP and output something like this on an nVidia GeForce GTX 780 Ti;

Fast ICP: 3.8693ms, Slow ICP: 6.1334ms
1.5852 times faster.

And something like this on an nVidia GeForce GTX 880M;

Fast ICP: 8.0522ms, Slow ICP: 11.3533ms
1.4100 times faster.

The main part to mess with is the thread/block sizes used, around line 339 of src/Cuda/icp.cu. Try what's best for you!

The code will output two files, fast.poses and slow.poses. You can evaluate them on the TUM benchmark by using their tools. I get something like this;

python ~/stuff/Kinect_Logs/Freiburg/evaluate_ate.py ~/Desktop/rgbd_dataset_freiburg1_desk/groundtruth.txt fast.poses 
0.147061
python ~/stuff/Kinect_Logs/Freiburg/evaluate_ate.py ~/Desktop/rgbd_dataset_freiburg1_desk/groundtruth.txt slow.poses 
0.147113

The difference in values comes down to the fact that each method uses a different reduction scheme and floating point operations are not associative.

Also, if you're using this code in academic work and it would be suitable to do so, please consider referencing some of my possibly relevant research in your literature review/related work section.

About

Super fast implementation of ICP in CUDA for compute capable devices 3.0 or higher

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C++ 70.5%
  • Cuda 26.5%
  • CMake 3.0%