Skip to content


Folders and files

Last commit message
Last commit date

Latest commit



20 Commits

Repository files navigation

Dependency Prediction Networks

This project is for dependency prediction from images.


Getting started

  • This project works with the data produced by the clcv project. In order to start, please set the CLCV_HOME evironment variable point to the CLCV project: export CLCV_HOME=/path/to/the/clcv/project
  • Now, everything is ready.


Basic usage

  usage: [-h] [--train_image_dir TRAIN_IMAGE_DIR]
              [--val_image_dir VAL_IMAGE_DIR] [--finetune FINETUNE]
              [--cnn_type {vgg19,resnet152}] [--batch_size BATCH_SIZE]
              [--learning_rate LEARNING_RATE] [--num_epochs NUM_EPOCHS]
              [--lr_update LR_UPDATE] [--max_patience MAX_PATIENCE]
              [--val_step VAL_STEP] [--num_workers NUM_WORKERS]
              [--log_step LOG_STEP]
              [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL}] [--seed SEED]
              train_label val_label train_imageinfo val_imageinfo

  positional arguments:
    train_label           path to the h5 file containing the training labels
    val_label             path to the h5 file containing the validating labels
    train_imageinfo       imageinfo contains image path
    val_imageinfo         imageinfo contains image path
    output_file           output model file (*.pth)

  optional arguments:
    -h, --help            show this help message and exit
    --train_image_dir TRAIN_IMAGE_DIR
                        path to training image dir
    --val_image_dir VAL_IMAGE_DIR
                        path to validating image dir
    --finetune FINETUNE   Fine-tune the image encoder.
    --cnn_type {vgg19,resnet152}
                        The CNN used for image encoder (e.g. vgg19, resnet152)
    --batch_size BATCH_SIZE
                        batch size
    --learning_rate LEARNING_RATE
                        learning rate
    --num_epochs NUM_EPOCHS
                        max number of epochs to run the training
    --lr_update LR_UPDATE
                        Number of epochs to update the learning rate.
    --max_patience MAX_PATIENCE
                        max number of epoch to run since the minima is
                        detected -- early stopping
    --val_step VAL_STEP   how often do we check the model (in terms of epoch)
    --num_workers NUM_WORKERS
                          number of workers (each worker use a process to load a
                          batch of data)

Example of using make rules for training

    make train GID=0 BATCH_SIZE=128 LEARNING_RATE=0.0001 CNN_TYPE=vgg19 FINETUNE=False NUM_WORKERS=4

Training on multiple GPUs are also supported. For example, if you want to train the same model on the first 4 GPUs (GID=0,1,2,3), you can use the make command as follows.

    make train GID=0,1,2,3 BATCH_SIZE=128 LEARNING_RATE=0.0001 CNN_TYPE=vgg19 FINETUNE=False NUM_WORKERS=4


Basic usage

  usage: [-h] [--test_image_dir TEST_IMAGE_DIR]
               [--batch_size BATCH_SIZE] [--num_workers NUM_WORKERS]
               [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL}]
               test_label test_imageinfo model_file output_file

  positional arguments:
    test_label            path to the h5 file containing the testing labels info
    test_imageinfo        imageinfo contains image path
    model_file            path to the model file
    output_file           path to the output file

  optional arguments:
    -h, --help            show this help message and exit
    --test_image_dir TEST_IMAGE_DIR
                          path to the image dir
    --batch_size BATCH_SIZE
                          batch size
    --num_workers NUM_WORKERS
                          number of workers (each worker use a process to load a
                          batch of data)

Example of using make rules for testing

    make test GID=0 BATCH_SIZE=128 NUM_WORKERS=4


Experimental setting

  • Dataset MSCOCO 2014
  • Train, val, test sets are dev1, dev2, val respectively
  • Evaluation metric: mean average precision (mAP)
  • Hyperameters used to train DepNet models:
Hyperparameter Value
batch_size 128 (64 if finetuning is used)
learning_rate 1e-4
lr_update 10
num_epochs 30
max_patience 5


Results using my* concepts on the test set (in terms of mAP)

Run myconceptsv3 mydepsv4 mydepsprepv4 mypasv4 mypasprepv4
Vgg19 0.4496 0.1994 0.1980 0.2121 0.2135
ResNet152 0.4525 0.1957 0.1951 0.2079 0.2091
Vgg19 + Finetune 0.4925 0.1841 0.2020 0.2183 0.2176
ResNet152 + Finetune 0.5188 0.2343 0.2320 0.2499 0.2511

Results using ex* concepts on the test set (in terms of mAP)

Run exconceptsv3 exdepsv4 exdepsprepv4 expasv4 expasprepv4
Vgg19 0.5270 0.2414 0.2388 0.2692 0.2714
ResNet152 0.5276 0.2374 0.2350 0.2649 0.2674
Vgg19 + Finetune 0.5601 0.2168 0.2532 0.2864 0.2858
ResNet152 + Finetune 0.5897 0.2856 0.2806 0.3177 0.3186

Some observations

  • Most models are not terminated by the early stopping condition, but the maximum number of training epochs. Better checkpoints might be obtained by increasing the num_epochs hyperparameter; or it may be necessary to increase the learning_rate and/or lr_update value so that the model can converge faster.
  • Finetuning is applied at the begining for the whole network. A better strategy would be training the last layer first and then start the finetuning.
  • Without finetuning, performance of Vgg19 and ResNet152 are more or less similar.
  • Finetuning can significantly boost the performance, and finetuning on ResNet brings more improvements.
  • Finetuning requires more memory, and the convergence rate is rather slow for the some initial epochs. You may need to increase the max_patience hyperparamter to prevent the training from stopping too early, and increase the num_epochs to get a better result.



Project for dependency predictions







No releases published


No packages published