Skip to content

Latest commit



153 lines (126 loc) · 6.51 KB

File metadata and controls

153 lines (126 loc) · 6.51 KB

Dependency Prediction Networks

This project is for dependency prediction from images.


Getting started

  • This project works with the data produced by the clcv project. In order to start, please set the CLCV_HOME evironment variable point to the CLCV project: export CLCV_HOME=/path/to/the/clcv/project
  • Now, everything is ready.


Basic usage

  usage: [-h] [--train_image_dir TRAIN_IMAGE_DIR]
              [--val_image_dir VAL_IMAGE_DIR] [--finetune FINETUNE]
              [--cnn_type {vgg19,resnet152}] [--batch_size BATCH_SIZE]
              [--learning_rate LEARNING_RATE] [--num_epochs NUM_EPOCHS]
              [--lr_update LR_UPDATE] [--max_patience MAX_PATIENCE]
              [--val_step VAL_STEP] [--num_workers NUM_WORKERS]
              [--log_step LOG_STEP]
              [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL}] [--seed SEED]
              train_label val_label train_imageinfo val_imageinfo

  positional arguments:
    train_label           path to the h5 file containing the training labels
    val_label             path to the h5 file containing the validating labels
    train_imageinfo       imageinfo contains image path
    val_imageinfo         imageinfo contains image path
    output_file           output model file (*.pth)

  optional arguments:
    -h, --help            show this help message and exit
    --train_image_dir TRAIN_IMAGE_DIR
                        path to training image dir
    --val_image_dir VAL_IMAGE_DIR
                        path to validating image dir
    --finetune FINETUNE   Fine-tune the image encoder.
    --cnn_type {vgg19,resnet152}
                        The CNN used for image encoder (e.g. vgg19, resnet152)
    --batch_size BATCH_SIZE
                        batch size
    --learning_rate LEARNING_RATE
                        learning rate
    --num_epochs NUM_EPOCHS
                        max number of epochs to run the training
    --lr_update LR_UPDATE
                        Number of epochs to update the learning rate.
    --max_patience MAX_PATIENCE
                        max number of epoch to run since the minima is
                        detected -- early stopping
    --val_step VAL_STEP   how often do we check the model (in terms of epoch)
    --num_workers NUM_WORKERS
                          number of workers (each worker use a process to load a
                          batch of data)

Example of using make rules for training

    make train GID=0 BATCH_SIZE=128 LEARNING_RATE=0.0001 CNN_TYPE=vgg19 FINETUNE=False NUM_WORKERS=4

Training on multiple GPUs are also supported. For example, if you want to train the same model on the first 4 GPUs (GID=0,1,2,3), you can use the make command as follows.

    make train GID=0,1,2,3 BATCH_SIZE=128 LEARNING_RATE=0.0001 CNN_TYPE=vgg19 FINETUNE=False NUM_WORKERS=4


Basic usage

  usage: [-h] [--test_image_dir TEST_IMAGE_DIR]
               [--batch_size BATCH_SIZE] [--num_workers NUM_WORKERS]
               [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL}]
               test_label test_imageinfo model_file output_file

  positional arguments:
    test_label            path to the h5 file containing the testing labels info
    test_imageinfo        imageinfo contains image path
    model_file            path to the model file
    output_file           path to the output file

  optional arguments:
    -h, --help            show this help message and exit
    --test_image_dir TEST_IMAGE_DIR
                          path to the image dir
    --batch_size BATCH_SIZE
                          batch size
    --num_workers NUM_WORKERS
                          number of workers (each worker use a process to load a
                          batch of data)

Example of using make rules for testing

    make test GID=0 BATCH_SIZE=128 NUM_WORKERS=4


Experimental setting

  • Dataset MSCOCO 2014
  • Train, val, test sets are dev1, dev2, val respectively
  • Evaluation metric: mean average precision (mAP)
  • Hyperameters used to train DepNet models:
Hyperparameter Value
batch_size 128 (64 if finetuning is used)
learning_rate 1e-4
lr_update 10
num_epochs 30
max_patience 5


Results using my* concepts on the test set (in terms of mAP)

Run myconceptsv3 mydepsv4 mydepsprepv4 mypasv4 mypasprepv4
Vgg19 0.4496 0.1994 0.1980 0.2121 0.2135
ResNet152 0.4525 0.1957 0.1951 0.2079 0.2091
Vgg19 + Finetune 0.4925 0.1841 0.2020 0.2183 0.2176
ResNet152 + Finetune 0.5188 0.2343 0.2320 0.2499 0.2511

Results using ex* concepts on the test set (in terms of mAP)

Run exconceptsv3 exdepsv4 exdepsprepv4 expasv4 expasprepv4
Vgg19 0.5270 0.2414 0.2388 0.2692 0.2714
ResNet152 0.5276 0.2374 0.2350 0.2649 0.2674
Vgg19 + Finetune 0.5601 0.2168 0.2532 0.2864 0.2858
ResNet152 + Finetune 0.5897 0.2856 0.2806 0.3177 0.3186

Some observations

  • Most models are not terminated by the early stopping condition, but the maximum number of training epochs. Better checkpoints might be obtained by increasing the num_epochs hyperparameter; or it may be necessary to increase the learning_rate and/or lr_update value so that the model can converge faster.
  • Finetuning is applied at the begining for the whole network. A better strategy would be training the last layer first and then start the finetuning.
  • Without finetuning, performance of Vgg19 and ResNet152 are more or less similar.
  • Finetuning can significantly boost the performance, and finetuning on ResNet brings more improvements.
  • Finetuning requires more memory, and the convergence rate is rather slow for the some initial epochs. You may need to increase the max_patience hyperparamter to prevent the training from stopping too early, and increase the num_epochs to get a better result.