This project proposes to implement an object detection network using deep learning techniques to predict in real-time if an RGB video contains a hand grasping an object.
Pre-Publication Paper: https://drive.google.com/file/d/1YBY8jsC4y6fuyIjIW1ykdgBmoluEp3AG/view?usp=sharing
Presentation Video: https://www.youtube.com/watch?v=y7nI9wQG0e8
Grab detection is the detection of hands grasping objects
This is a fork of the implementation of AlexeyAB's implementation of YoloV4 on darknet. linked here: https://github.com/AlexeyAB/darknet
First install the required dependencies as described in the YOLOv4 github: https://github.com/AlexeyAB/darknet#requirements
Alternatively Augmented startups provides a great 2-part step-by-step guide for Windows on Youtube: https://www.youtube.com/watch?v=5pYh1rFnNZs
https://www.youtube.com/watch?v=sUxAVpzZ8hU In this video clone this github instead of the YOLO4 github
https://drive.google.com/file/d/1B9WDT8EKs0NLcTynmniGzeyvvGuh_Fcc/view?usp=sharing
~/darknet/build/darknet/x64/backup
~/darknet/build/darknet/x64
darknet.exe detector demo data/obj.data yolo-obj.cfg backup/yolo-obj_best.weights filename_of_your_video
A human grasping dataset taken from different angles.
https://drive.google.com/file/d/1xeTGrnWud8X1A9PuonK_mwIHl6UsHeUr/view?usp=sharing
https://drive.google.com/open?id=1Hs_dKiOXMXJupfJTYxankmKEhLa_U2Q0
https://drive.google.com/open?id=1L9LAARDvmwcIoDtLduz9YnWYOrOeSDeK
Average Precision results at IoU of 0.5
Average FPS on Videos
Comparison with other Object Detection Architectures in the task of Grab detection: