This is the code for our papers: Gated Hierarchical Attention for Image Captioning and CNN+CNN: Convolutional Decoders for Image Captioning. To run it you should first install Pytorch 0.3.0.
- Download the MSCOCO2014 dataset here.
- Unzip the files, and you put the training and validation images in the same folder. Put captions_val2014.json file in the annotation folder.
- Download Karpathy's split here, and put it in the folder data/files/, then run ak_build_vocab.py in the data folder to preprocess the dataset.
- Download COCO evaluation metrics here. Copy all files to models/coco_eval.
- Let self.image_dir in train.py equal to the path of the folder in step 2. Also, you can change other parameters in the configuration.
After training you can use the inference.py to generate captions for the images in the test split. Also, you should assign the path of the image folder to self.image_dir.