Skip to content

DangChuong-DC/NAS-VQA-attempt

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Pytorch implementation for attempt Neural Architecture Search on Visual Question Answering task

Description: This repository exists for legacy reason

If you make use of this code, please cite the following information [and star me (0.0)]

@{Dang_2020_NAS_Attempt
author = {Dang, Anh-Chuong}
title = {Attempt Neural Architecture Search on Visual Question Answering task},
month = {May},
year = {2020}
publisher = {GitHub}
journal = {GitHub repository}
commit = {master}
}

Abstract

This repository contains Pytorch implementation for my attempt NAS on Vision Language models (VQA task). In this work, I utilized MCAN-VQA model and factorized its operations then applied Search algorithms i.e SNAS to optimize Network's architecture.
Figure 1: Overview of NAS-VQA. For more detais, plz refer to my code as well as summary report summary.pdf.

Prerequisites

Dependencies

You should install some necessary packages.

  1. Install Python >= 3.5

  2. Install Cuda >= 9.0 and cuDNN

  3. Install PyTorch >= 1.x with CUDA.

  4. Install SpaCy and initialize the GloVe as follows:

    $ pip install -r requirements.txt
    $ wget https://github.com/explosion/spacy-models/releases/download/en_vectors_web_lg-2.1.0/en_vectors_web_lg-2.1.0.tar.gz -O en_vectors_web_lg-2.1.0.tar.gz
    $ pip install en_vectors_web_lg-2.1.0.tar.gz

Setup

The image features are extracted using the bottom-up-attention strategy, with each image being represented as an dynamic number (from 10 to 100) of 2048-D features. We store the features for each image in a .npz file. You can prepare the visual features by yourself or download the extracted features from OneDrive or BaiduYun. The downloaded files contains three files: train2014.tar.gz, val2014.tar.gz, and test2015.tar.gz, corresponding to the features of the train/val/test images for VQA-v2, respectively.
For more details of setup: Please refer to repository (https://github.com/MILVLG/mcan-vqa)

Training

For search stage, run file run_search.py. Command for running search:

python run_search.py --RUN=str --GPU=str --SEED=int --PRELOAD=bool
  • After you achieved desired architecture, please copy it to namedtuple VQAGenotype in genotypes.py file in model folder.

For evaluation stage, run file run.py. Command for running evaluation:

python run.py --RUN=str --ARCH_NAME=str --GPU=str --SEED=int --PRELOAD=bool

where;
str: should be replaced with string element of your choices. e.g. For --RUN, option choices are {'train', 'val'}
int: a integer element of your choices
bool: boolean element, i.e. True or False

Progression

The project was under progression. However, around the end of April 2020, A great work, which has quite similar approach with more favorable results, was published hence unfortunately I decided to stop this project.
Published paper (mentioned above): Deep Multimodal Neural Architecture Search
As for personal curiosity, any further suggestions, advices are welcome.

Implementation References

https://github.com/MILVLG/mcan-vqa
https://github.com/cvlab-tohoku/Dense-CoAttention-Network

About

Self Project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages