The official Faster R-CNN code (written in MATLAB) is available here. If your goal is to reproduce the results in our NIPS 2015 paper, please use the official code.
This repository contains a Python reimplementation of the MATLAB code. This Python implementation is built on a fork of Fast R-CNN. There are slight differences between the two implementations. In particular, this Python port
- is ~10% slower at test-time, because some operations execute on the CPU in Python layers (e.g., 220ms / image vs. 200ms / image for VGG16)
- gives similar, but not exactly the same, mAP as the MATLAB version
- is not compatible with models trained using the MATLAB code due to the minor implementation differences
- includes approximate joint training that is 1.5x faster than alternating optimization (for VGG16) -- see these slides for more information
By Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun (Microsoft Research)
This Python implementation contains contributions from Sean Bell (Cornell) written during an MSR internship.
Please see the official README.md for more details.
Faster R-CNN was initially described in an arXiv tech report and was subsequently published in NIPS 2015.
Faster R-CNN is released under the MIT License (refer to the LICENSE file for details).
If you find Faster R-CNN useful in your research, please consider citing:
@inproceedings{renNIPS15fasterrcnn,
Author = {Shaoqing Ren and Kaiming He and Ross Girshick and Jian Sun},
Title = {Faster {R-CNN}: Towards Real-Time Object Detection
with Region Proposal Networks},
Booktitle = {Advances in Neural Information Processing Systems ({NIPS})},
Year = {2015}
}
- Requirements: software
- Requirements: hardware
- Basic installation
- Demo
- Beyond the demo: training and testing
- Usage
- Requirements for
Caffe
andpycaffe
(see: Caffe installation instructions)
Note: Caffe must be built with support for Python layers!
# In your Makefile.config, make sure to have this line uncommented
WITH_PYTHON_LAYER := 1
# Unrelatedly, it's also recommended that you use CUDNN
USE_CUDNN := 1
You can download my Makefile.config for reference.
2. Python packages you might not have: cython
, python-opencv
, easydict
3. [Optional] MATLAB is required for official PASCAL VOC evaluation only. The code now includes unofficial Python evaluation code.
- For training smaller networks (ZF, VGG_CNN_M_1024) a good GPU (e.g., Titan, K20, K40, ...) with at least 3G of memory suffices
- For training Fast R-CNN with VGG16, you'll need a K40 (~11G of memory)
- For training the end-to-end version of Faster R-CNN with VGG16, 3G of GPU memory is sufficient (using CUDNN)
- Clone the Faster R-CNN repository
# Make sure to clone with --recursive
git clone --recursive https://github.com/rbgirshick/py-faster-rcnn.git
-
We'll call the directory that you cloned Faster R-CNN into
FRCN_ROOT
Ignore notes 1 and 2 if you followed step 1 above.
Note 1: If you didn't clone Faster R-CNN with the
--recursive
flag, then you'll need to manually clone thecaffe-fast-rcnn
submodule:git submodule update --init --recursive
Note 2: The
caffe-fast-rcnn
submodule needs to be on thefaster-rcnn
branch (or equivalent detached state). This will happen automatically if you followed step 1 instructions. -
Build the Cython modules
cd $FRCN_ROOT/lib make
-
Build Caffe and pycaffe
cd $FRCN_ROOT/caffe-fast-rcnn # Now follow the Caffe installation instructions here: # http://caffe.berkeleyvision.org/installation.html # If you're experienced with Caffe and have all of the requirements installed # and your Makefile.config in place, then simply do: make -j8 && make pycaffe
-
Download pre-computed Faster R-CNN detectors
cd $FRCN_ROOT ./data/scripts/fetch_faster_rcnn_models.sh
This will populate the
$FRCN_ROOT/data
folder withfaster_rcnn_models
. Seedata/README.md
for details. These models were trained on VOC 2007 trainval.
After successfully completing basic installation, you'll be ready to run the demo.
To run the demo
cd $FRCN_ROOT
./tools/demo.py
The demo performs detection using a VGG16 network trained for detection on PASCAL VOC 2007.
-
Download the training, validation, test data and VOCdevkit
wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtrainval_06-Nov-2007.tar wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtest_06-Nov-2007.tar wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCdevkit_08-Jun-2007.tar
-
Extract all of these tars into one directory named
VOCdevkit
tar xvf VOCtrainval_06-Nov-2007.tar tar xvf VOCtest_06-Nov-2007.tar tar xvf VOCdevkit_08-Jun-2007.tar
-
It should have this basic structure
$VOCdevkit/ # development kit $VOCdevkit/VOCcode/ # VOC utility code $VOCdevkit/VOC2007 # image sets, annotations, etc. # ... and several other directories ...
-
Create symlinks for the PASCAL VOC dataset
cd $FRCN_ROOT/data ln -s $VOCdevkit VOCdevkit2007
Using symlinks is a good idea because you will likely want to share the same PASCAL dataset installation between multiple projects.
-
[Optional] follow similar steps to get PASCAL VOC 2010 and 2012
-
[Optional] If you want to use COCO, please see some notes under
data/README.md
-
Follow the next sections to download pre-trained ImageNet models
Pre-trained ImageNet models can be downloaded for the three networks described in the paper: ZF and VGG16.
cd $FRCN_ROOT
./data/scripts/fetch_imagenet_models.sh
VGG16 comes from the Caffe Model Zoo, but is provided here for your convenience. ZF was trained at MSRA.
To train and test a Faster R-CNN detector using the alternating optimization algorithm from our NIPS 2015 paper, use experiments/scripts/faster_rcnn_alt_opt.sh
.
Output is written underneath $FRCN_ROOT/output
.
cd $FRCN_ROOT
./experiments/scripts/faster_rcnn_alt_opt.sh [GPU_ID] [NET] [--set ...]
# GPU_ID is the GPU you want to train on
# NET in {ZF, VGG_CNN_M_1024, VGG16} is the network arch to use
# --set ... allows you to specify fast_rcnn.config options, e.g.
# --set EXP_DIR seed_rng1701 RNG_SEED 1701
("alt opt" refers to the alternating optimization training algorithm described in the NIPS paper.)
To train and test a Faster R-CNN detector using the approximate joint training method, use experiments/scripts/faster_rcnn_end2end.sh
.
Output is written underneath $FRCN_ROOT/output
.
cd $FRCN_ROOT
./experiments/scripts/faster_rcnn_end2end.sh [GPU_ID] [NET] [--set ...]
# GPU_ID is the GPU you want to train on
# NET in {ZF, VGG_CNN_M_1024, VGG16} is the network arch to use
# --set ... allows you to specify fast_rcnn.config options, e.g.
# --set EXP_DIR seed_rng1701 RNG_SEED 1701
This method trains the RPN module jointly with the Fast R-CNN network, rather than alternating between training the two. It results in faster (~ 1.5x speedup) training times and similar detection accuracy. See these slides for more details.
Artifacts generated by the scripts in tools
are written in this directory.
Trained Fast R-CNN networks are saved under:
output/<experiment directory>/<dataset name>/
Test outputs are saved under:
output/<experiment directory>/<dataset name>/<network snapshot name>/
Firstly, create PASCAL-VOC style dataset by means of the Images and Dataset notebook. As an example grotoap2
is used here as dataset name. The dataset should have the following structure :
\grotoap2 (replace with your dataset name)
\Annotations <- will contain annotations
\ImageSets
\Main <- will contain textfiles indicating train/val/test splits
\JPEGImages <- Copy all the images here
Then symlink the dataset in the same way as is done for the VOC2007 devkit in the above example.
Add the dataset entry to the lib/datasets/factory.py
as such:
# Set up grotoap2 (replace name)
for split in ['train', 'val', 'trainval', 'test']:
name = 'grotoap2_{}'.format(split)
__sets[name] = (lambda split=split: grotoap2(split))
and create the dataset file in lib/datasets
. This is done the easiest by copying the grotoap2 file and by doing a find and replace of grotoap2
with the new dataset name. The classes contained in the dataset should be specified in line 32.
Copy the models/grotoap2
folder to models/[your_name]
and make the following adjustments to make the RPN work with the right amount of object classes:
VGG16/faster_rcnn_end2end/train.prototxt:526 param_str: "'num_classes': 24" -> param_str: "'num_classes': {number of classes}"
VGG16/faster_rcnn_end2end/train.prototxt:639 num_output: 96 -> num_output: {number_of_classes} * 4
VGG16/faster_rcnn_end2end/test.prototxt:567 num_output: 24 -> num_output: {number_of_classes}
VGG16/faster_rcnn_end2end/test.prototxt:592 num_output: 96 -> num_output: {number_of_classes} * 4
Add an entry in experiments/scripts/faster_rcnn_end2end.sh
to include your dataset like so
grotoap2)
TRAIN_IMDB="grotoap2_trainval"
TEST_IMDB="grotoap2_test"
PT_DIR="grotoap2"
ITERS=70000
;;
Change the ITERS
parameter to more or less iterations. Note this will change the name of the pre-trained model needed for testing from vgg16_faster_rcnn_iter_70000.caffemodel
to vgg16_faster_rcnn_iter_[ITERS].caffemodel
.
The network can then be trained by executing the following command in the py-faster-rcnn
folder.
./experiments/scripts/faster_rcnn_end2end.sh 2 VGG16 [your_name]
This will train Faster R-CNN on gpu 2 and automatically test it once done, which will output a UID, which can be used to calculate the performance in the Results notebook. The test can also be run manually with the following command:
./tools/test_net.py --gpu 1 --def models/grotoap2/VGG16/faster_rcnn_end2end/test.prototxt --net /home/student/py-faster-rcnn/output/grotoap2/grotoap2__trainval/vgg16_faster_rcnn_iter_70000.caffemodel --imdb grotoap2_test --cfg experiments/cfgs/faster_rcnn_end2end.yml
Here, replace all instances of grotoap2 with your dataset name. This will output, amongst others, something like this:
Writing table grotoap2 results file
/home/student/py-faster-rcnn/data/grotoap2/results/grotoap2/Main/comp4_5bab918c-b041-4803-af67-4d0ecc4d35e7_det_test_table.txt
Copy the specific UID (4_5bab918c-b041-4803-af67-4d0ecc4d35e7) and use this to calculate the results and visualize the predictions in the notebooks.