Skip to content

Commit

Permalink
Upload Readme
Browse files Browse the repository at this point in the history
  • Loading branch information
Wangt-CN authored Mar 5, 2020
1 parent 74ad995 commit b02d56d
Showing 1 changed file with 0 additions and 79 deletions.
79 changes: 0 additions & 79 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,16 +4,11 @@
[![Python](https://img.shields.io/badge/python-3.7-blue.svg)](https://www.python.org/)
![PyTorch](https://img.shields.io/badge/pytorch-1.0-%237732a8)

<<<<<<< HEAD

**[NEW]:** We have provided the training code of VC R-CNN and detailed readme file. :star2:

**[NEW]:** the VC Feature pretrained on MSCOCO is provided. Just have a try! :star2:

=======
**NEW:** the VC Feature pretrained on MSCOCO is provided. Just have a try!

>>>>>>> 6b09a5f0bd76ca65b6530627fb7decb643d55b54
This repository contains the official PyTorch implementation and the proposed VC feature (beta version) for [CVPR 2020](http://cvpr2020.thecvf.com/) Paper "[Visual Commonsense R-CNN](https://arxiv.org/abs/2002.12204)" (The link is the Arxiv version which may be slightly different from our final camera-ready version). For technical details, please refer to:

**Visual Commonsense R-CNN** <br />
Expand All @@ -29,26 +24,15 @@ This repository contains the official PyTorch implementation and the proposed VC
If you find our VC feature and code helpful, please kindly consider citing:

```
<<<<<<< HEAD
@article{wang2020visual,
title={Visual commonsense r-cnn},
author={Wang, Tan and Huang, Jianqiang and Zhang, Hanwang and Sun, Qianru},
journal={arXiv preprint arXiv:2002.12204},
year={2020}
=======
@misc{wang2020visual,
title={Visual Commonsense R-CNN},
author={Tan Wang and Jianqiang Huang and Hanwang Zhang and Qianru Sun},
year={2020},
eprint={2002.12204},
archivePrefix={arXiv},
primaryClass={cs.CV}
>>>>>>> 6b09a5f0bd76ca65b6530627fb7decb643d55b54
}
```

## Contents
<<<<<<< HEAD
1. [Overview](overview)
- [Highlights](#highlights)
- [What can you get from this repo? [The Road Map]](#what-can-you-get-from-this-repo-the-road-map)
Expand Down Expand Up @@ -129,50 +113,6 @@ For users can extract VC Features if they want, here we also provide the Updown


### How to use after download
=======
1. Overview
2. Our VC Feature
- Pretrained on COCO
- Downstream Models (**To be update**)
3. VC R-CNN Framework
- **To be update**


## Overview
This projuct aims to build a visual commonsense representation learning framework based on the current [object detection codebase](https://github.com/facebookresearch/maskrcnn-benchmark) with un-/self-supervised learning. The **highlights** of our proposed method and feature are listed below:

1. VC Feature:
- **Effective**: Our visual commonsense representation encodes the``sense-making'' knowledge between object RoIs with causal intervention rather than just trivial correlation prediction. Compared to the previous widely used [Up-Down Feature](https://github.com/peteanderson80/bottom-up-attention), our VC can be regarded as an effective supplementary knowledge that models the interaction between objects for the downstream tasks.
- **Easy to Use**: As we introduced in our paper, the VC Feature is extracted by providing the RoI boxes coordinates. Then the VC Feature can be **just concatenated** on the previous visual object features (e.g., Up-Down Feature) and ready to roll.
- **Easy to Expand**: With a learned VC R-CNN framework, we can easily extract VC Features for any images and prepare them as an ``augmentation feature'' for the currently used representation conveniently.
2. VC R-CNN
- **Fast, Memory-efficient, Multi-GPU**: Our VC R-CNN framework is based on the well-known [maskrcnn-benchmark](https://github.com/facebookresearch/maskrcnn-benchmark) from facebook. Therefore, our VC R-CNN just inherit all its advantages. (It's pity that the [detectron2](https://github.com/facebookresearch/detectron2) had not been released when I am working on this project, however maskrcnn-benchmark can be a stable version.)
- **Support customized dataset**: Users can easily adds COCO-style datasets to train VC R-CNN on other images.



## VC Feature

For easy-to-use, here we directly provide the pretrained VC Features on the entire **MSCOCO dataset** based on the [Up-Down](https://github.com/peteanderson80/bottom-up-attention) feature's boxes in the below links (This feature is a beta version and we would further fine-tune our VC Feature for better downstream performance). The features are stored in tar.gz format.



#### 10 - 100 features per image (adaptive):

- COCO 2014 Train/Val Image Features (123K / 6G) &ensp; [Google Drive](https://drive.google.com/file/d/1z36lR-CwLjwsJPPxE-phqZTMQCTb5KLV/view?usp=sharing) &ensp; [Baidu Drive (key:ec8x)](https://pan.baidu.com/s/1alOZkyGCJSso_znc2i2REA)
- COCO 2014 Testing Image Features (41K / 2G) &ensp; [Google Drive](https://drive.google.com/file/d/1PQANKKRdD6j980SjokNTCXV5aQNGW4zS/view?usp=sharing) &ensp; [Baidu Drive (key:ec8x)](https://pan.baidu.com/s/1alOZkyGCJSso_znc2i2REA)
- COCO 2015 Testing Image Features (81K / 4G) &ensp; [Google Drive](https://drive.google.com/file/d/1U9-EbQI8ZCFe7MvmJXI1xCDTbW2f-E98/view?usp=sharing) &ensp; [Baidu Drive (key:ec8x)](https://pan.baidu.com/s/1alOZkyGCJSso_znc2i2REA)

**Ps**: For those who may have no access to the Up-Down feature, here we also provide the feature **after concatenation** and you can directly use without `numpy.concatenate` (The feature dimension is 3072 : 2048+1024):

- [concat] COCO 2014 Train/Val Image Features (123K / 27G) &ensp; [Google Drive](https://drive.google.com/file/d/1kBnVvph5ISWWljOPeWCdFHWg7lkOp6QX/view?usp=sharing)
- [concat] COCO 2014 Testing Image Features (41K / 9G) &ensp; [Google Drive](https://drive.google.com/file/d/1dSx4BeUJT1NOW6Fdlnmo_3HEbv7zrp1B/view?usp=sharing)
- [concat] COCO 2015 Testing Image Features (81K / 17G) &ensp; [Google Drive](https://drive.google.com/file/d/1Sp8w8BTyiVMJjlSUJWvgFFTaH2AAayJQ/view?usp=sharing)



#### How to use after download
>>>>>>> 6b09a5f0bd76ca65b6530627fb7decb643d55b54

- Unzip the file with command:
```bash
Expand All @@ -182,24 +122,16 @@ tar -xzvf file_name
- The feature format (The shape of each numpy file is [n x 1024]):
```
coco_trainval/test_year
<<<<<<< HEAD
|---image_id1.npy
|---image_id2.npy
...
|---image_idN.npy
=======
|---image1.npy
|---image2.npy
...
|---imageN.npy
>>>>>>> 6b09a5f0bd76ca65b6530627fb7decb643d55b54
```

- Concatenate on the previous feature in the downstream task training.



<<<<<<< HEAD
### Downstream Vision & Language Tasks
Please check [Downstream Tasks](downstream/README.md) for more details:

Expand Down Expand Up @@ -382,14 +314,3 @@ I really appreciate [**Kaihua Tang**](https://kaihuatang.github.io/), **[Yulei N


If you have any questions or concerns, please kindly email to [**Tan Wang**](https://wangt-cn.github.io/).
=======
#### Tips for using in downstream tasks

- We recommend users to **add the dimension** of the start multi-layers (embedding layer, fc and so on) in the downstream networks since the feature size add from 2048 to 3072 (for Up-Down Feature).
- The learning rate can be **slighted reduced**.
- We find the self-attentive operation on feature (e.g., the refining encoder in AoANet) may hurt the effectiveness of our VC Feature. Details can be kindly found at the bottom of Page 7 in our paper.



If you have any questions or concerns, please kindly email to [Tan Wang]([email protected]).
>>>>>>> 6b09a5f0bd76ca65b6530627fb7decb643d55b54

0 comments on commit b02d56d

Please sign in to comment.