This repository trains a pre-trained Faster RCNN model, with a Resnet-50 backbone, on an annotated dataset with three classes (aircraft, person, ship).
conda env create -f environment.yml
conda activate object_detection
To extract each Nth frame from the video, run
python data_collection.py --extract --nth_frame N
This will load the video in data/raw
and save each Nth frame to data/frames
.
You can then annotate this data using a tool of your choice, but the results must be exported in COCO format.
To split the dataset into a train/valid split with a set percentage of data in the training set.
python data_collection.py --split --percent_train 0.8
This will create train.json and valid.json files in data/annotated/
Run with the set hyper-parameters.
python train.py --batch_size BS --n_epochs N_EPOCHS --lr LR --tensoboard
This will train the model, log training metrics to tensorboard, and checkpoint the model to a models
directory.
Validation metrics will be logged to the terminal.
To perform inference with the trained model, run
python inference.py --model_name model_<EPOCH>.pth --thresh THRESH --tensorboard --data_path test --n_images N
This will load the model saved at EPOCH from the models directory on the given GPU and save the annotated images to a tensorboard image grid. The cut-off bounding box threshold and number of images to annotate from the test are also configurable.
To annotate a video, run
python inference.py --model_name model_<EPOCH>.pth --thresh THRESH --store_video --data_path test