Skip to content

Commit

Permalink
parse MSCOCO and Flickr captions for V-PCFG
Browse files Browse the repository at this point in the history
  • Loading branch information
zhaoyanpeng committed Feb 6, 2022
1 parent 42156f6 commit 50379be
Show file tree
Hide file tree
Showing 13 changed files with 167,767 additions and 0 deletions.
5 changes: 5 additions & 0 deletions xcfg/data/vpcfg/flickr/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
run `flickr_dl.sh' to prepare data.

2016 test-split in `https://github.com/multi30k/dataset/tree/master/data/task1/image_splits'

image and caption data are from `http://hockenmaier.cs.illinois.edu/DenotationGraph/data/'
7 changes: 7 additions & 0 deletions xcfg/data/vpcfg/flickr/flickr_dl.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
#wget https://raw.githubusercontent.com/multi30k/dataset/master/data/task1/image_splits/test_2016_flickr.txt -O test_ids.txt
#wget https://raw.githubusercontent.com/multi30k/dataset/master/data/task1/image_splits/train.txt -O train_ids.txt
#wget https://raw.githubusercontent.com/multi30k/dataset/master/data/task1/image_splits/val.txt -O val_ids.txt

wget http://hockenmaier.cs.illinois.edu/DenotationGraph/data/flickr30k.tar.gz
tar -zxvf flickr30k.tar.gz
rm flickr30k.tar.gz
9 changes: 9 additions & 0 deletions xcfg/data/vpcfg/mscoco/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
run `mscoco_dl.sh' to get mscoco annotation data.

`train_ids.txt' is partitioned into 4 parts so that we can parse faster by parsing the parts in parallel.

the splits seem to come from [train,test,val].txt in `coco_precomp' of `wget http://www.cs.toronto.edu/~faghri/vsepp/data.tar'

the top 1000 image ids in the [test,val].txt are used.

see https://github.com/fartashf/vsepp#download-data
5 changes: 5 additions & 0 deletions xcfg/data/vpcfg/mscoco/mscoco_dl.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
wget http://images.cocodataset.org/annotations/annotations_trainval2014.zip
unzip annotations_trainval2014.zip
rm annotations_trainval2014.zip
rm annotations/instances_*
rm annotations/person_*
Loading

0 comments on commit 50379be

Please sign in to comment.