-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
parse MSCOCO and Flickr captions for V-PCFG
- Loading branch information
1 parent
42156f6
commit 50379be
Showing
13 changed files
with
167,767 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
run `flickr_dl.sh' to prepare data. | ||
|
||
2016 test-split in `https://github.com/multi30k/dataset/tree/master/data/task1/image_splits' | ||
|
||
image and caption data are from `http://hockenmaier.cs.illinois.edu/DenotationGraph/data/' |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
#wget https://raw.githubusercontent.com/multi30k/dataset/master/data/task1/image_splits/test_2016_flickr.txt -O test_ids.txt | ||
#wget https://raw.githubusercontent.com/multi30k/dataset/master/data/task1/image_splits/train.txt -O train_ids.txt | ||
#wget https://raw.githubusercontent.com/multi30k/dataset/master/data/task1/image_splits/val.txt -O val_ids.txt | ||
|
||
wget http://hockenmaier.cs.illinois.edu/DenotationGraph/data/flickr30k.tar.gz | ||
tar -zxvf flickr30k.tar.gz | ||
rm flickr30k.tar.gz |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
run `mscoco_dl.sh' to get mscoco annotation data. | ||
|
||
`train_ids.txt' is partitioned into 4 parts so that we can parse faster by parsing the parts in parallel. | ||
|
||
the splits seem to come from [train,test,val].txt in `coco_precomp' of `wget http://www.cs.toronto.edu/~faghri/vsepp/data.tar' | ||
|
||
the top 1000 image ids in the [test,val].txt are used. | ||
|
||
see https://github.com/fartashf/vsepp#download-data |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
wget http://images.cocodataset.org/annotations/annotations_trainval2014.zip | ||
unzip annotations_trainval2014.zip | ||
rm annotations_trainval2014.zip | ||
rm annotations/instances_* | ||
rm annotations/person_* |
Oops, something went wrong.