Name	Name	Last commit message	Last commit date
Latest commit vedanuj Delete bertbabble, bertviz and bertdemo Jan 13, 2020 2d97dfe · Jan 13, 2020 History 268 Commits
config	config	add roberta config file	Aug 26, 2019
data	data	Remove data	Dec 7, 2019
evaluation	evaluation	Blackify files	Dec 7, 2019
fig	fig	add figure for readme	Aug 4, 2019
script	script	Blackify files	Dec 7, 2019
tools	tools	fix the controller	Sep 15, 2019
vilbert	vilbert	Blackify files	Dec 7, 2019
.gitignore	.gitignore	Add results folder to gitignore	Sep 17, 2019
CODE_OF_CONDUCT.md	CODE_OF_CONDUCT.md	Add License, code of conduct, contributing, license	Nov 25, 2019
CONTRIBUTING.md	CONTRIBUTING.md	Add License, code of conduct, contributing, license	Nov 25, 2019
LICENSE	LICENSE	Change to MIT LICENSE	Dec 6, 2019
README.md	README.md	Update readme to add License	Jan 13, 2020
eval_retrieval.py	eval_retrieval.py	Blackify files	Dec 7, 2019
eval_tasks.py	eval_tasks.py	Blackify files	Dec 7, 2019
requirements.txt	requirements.txt	Add distributed tensorpack dataset loading	Aug 12, 2019
setup.py	setup.py	Add file headers	Dec 7, 2019
train_concap.py	train_concap.py	Blackify files	Dec 7, 2019
train_tasks.py	train_tasks.py	Blackify files	Dec 7, 2019
vilbert_tasks.yml	vilbert_tasks.yml	Update hyperparaters	Nov 25, 2019

Repository files navigation

Multi-Task Vision and Language Representation Learning (ViLBERT-MT)

Code and pre-trained models for 12-in-1: Multi-Task Vision and Language Representation Learning.

Repository Setup

Create a fresh conda environment, and install all dependencies.

conda create -n vilbert python=3.6
conda activate vilbert-MT
git clone xxx
cd ViLBert-MT
pip install -r requirements.txt

Install pytorch

conda install pytorch torchvision cudatoolkit=10.0 -c pytorch

Install apex, follows https://github.com/NVIDIA/apex
Install this codebase as a package in this environment.

python setup.py develop

Data Setup

Check README.md under data for more details.

Visiolinguistic Pre-training

To train the model:

To be added

For internal use: copy the pre-trained checkpoint from Skynet

cp -a /srv/share3/jlu347/vilbert-MT/save/* #to_your_directory.

Benchmark Vision-Lanugage Tasks

Task	Sub-Task	Model	LR	Results (split)
VQA	-	ViLBERT	4e-5	70.55 (test-dev)
-	-	DFAF	-	70.22 (test-dev)
Ref Expression	RefCOCO+	ViLBERT	4e-5	72.34 (val) - 78.52 (testA) - 62.61 (testB)
-	RefCOCO+	MAttNet	-	65.33 (val) - 71.62 (testA) - 56.02 (testB)
Ref Expression	RefCOCO	ViLBERT	4e-5	-
-	RefCOCO	MAttNet	-	-
Ref Expression	Refg	ViLBERT	4e-5	-
-	Refg	MAttNet	-	-
Image Caption Ranking	Image Retrieval	ViLBERT	2e-5	58.20 (R1) - 84.90 (R5) - 91.52 (R10)
-	Image Retrieval	SCAN	-	48.60 (R1) - 77.70 (R5) - 85.20 (R10)

Single-task Training

VQA

To fintune a 6-layer vilbert model for VQA with 8 GPU. --tasks 1 means VQA tasks. Check vlbert_tasks.yml for more settings for VQA tasks.

python -m torch.distributed.launch --nproc_per_node=8 --nnodes=1 --node_rank=0 train_tasks.py --bert_model bert-base-uncased --from_pretrained save/bert_base_6_layer_6_connect_freeze_0/pytorch_model_8.bin  --config_file config/bert_base_6layer_6conect.json  --learning_rate 4e-5 --num_workers 16 --tasks 1 --save_name pretrained

Refer Expression

python -m torch.distributed.launch --nproc_per_node=8 --nnodes=1 --node_rank=0 train_tasks.py --bert_model bert-base-uncased --from_pretrained save/bert_base_6_layer_6_connect_freeze_0/pytorch_model_8.bin  --config_file config/bert_base_6layer_6conect.json  --learning_rate 4e-5 --num_workers 16 --tasks 11 --save_name pretrained

Image Retrieval

python -m torch.distributed.launch --nproc_per_node=8 --nnodes=1 --node_rank=0 train_tasks.py --bert_model bert-base-uncased --from_pretrained save/bert_base_6_layer_6_connect_freeze_0/pytorch_model_8.bin  --config_file config/bert_base_6layer_6conect.json  --learning_rate 4e-5 --num_workers 9 --tasks 11 --save_name pretrained

Multi-task Training

code tobe added here.

Fine-tune from Multi-task trained model

code tobe added here.

License

vilbert-multi-task is licensed under MIT license available in LICENSE file.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multi-Task Vision and Language Representation Learning (ViLBERT-MT)

Repository Setup

Data Setup

Visiolinguistic Pre-training

Benchmark Vision-Lanugage Tasks

Single-task Training

VQA

Refer Expression

Image Retrieval

Multi-task Training

Fine-tune from Multi-task trained model

License

About

Releases

Packages

Contributors 6

Languages

License

facebookresearch/vilbert-multi-task

Folders and files

Latest commit

History

Repository files navigation

Multi-Task Vision and Language Representation Learning (ViLBERT-MT)

Repository Setup

Data Setup

Visiolinguistic Pre-training

Benchmark Vision-Lanugage Tasks

Single-task Training

VQA

Refer Expression

Image Retrieval

Multi-task Training

Fine-tune from Multi-task trained model

License

About

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

Packages 0

Contributors 6

Languages

Packages