-
Notifications
You must be signed in to change notification settings - Fork 61
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
1 changed file
with
0 additions
and
79 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -4,16 +4,11 @@ | |
[![Python](https://img.shields.io/badge/python-3.7-blue.svg)](https://www.python.org/) | ||
![PyTorch](https://img.shields.io/badge/pytorch-1.0-%237732a8) | ||
|
||
<<<<<<< HEAD | ||
|
||
**[NEW]:** We have provided the training code of VC R-CNN and detailed readme file. :star2: | ||
|
||
**[NEW]:** the VC Feature pretrained on MSCOCO is provided. Just have a try! :star2: | ||
|
||
======= | ||
**NEW:** the VC Feature pretrained on MSCOCO is provided. Just have a try! | ||
|
||
>>>>>>> 6b09a5f0bd76ca65b6530627fb7decb643d55b54 | ||
This repository contains the official PyTorch implementation and the proposed VC feature (beta version) for [CVPR 2020](http://cvpr2020.thecvf.com/) Paper "[Visual Commonsense R-CNN](https://arxiv.org/abs/2002.12204)" (The link is the Arxiv version which may be slightly different from our final camera-ready version). For technical details, please refer to: | ||
|
||
**Visual Commonsense R-CNN** <br /> | ||
|
@@ -29,26 +24,15 @@ This repository contains the official PyTorch implementation and the proposed VC | |
If you find our VC feature and code helpful, please kindly consider citing: | ||
|
||
``` | ||
<<<<<<< HEAD | ||
@article{wang2020visual, | ||
title={Visual commonsense r-cnn}, | ||
author={Wang, Tan and Huang, Jianqiang and Zhang, Hanwang and Sun, Qianru}, | ||
journal={arXiv preprint arXiv:2002.12204}, | ||
year={2020} | ||
======= | ||
@misc{wang2020visual, | ||
title={Visual Commonsense R-CNN}, | ||
author={Tan Wang and Jianqiang Huang and Hanwang Zhang and Qianru Sun}, | ||
year={2020}, | ||
eprint={2002.12204}, | ||
archivePrefix={arXiv}, | ||
primaryClass={cs.CV} | ||
>>>>>>> 6b09a5f0bd76ca65b6530627fb7decb643d55b54 | ||
} | ||
``` | ||
|
||
## Contents | ||
<<<<<<< HEAD | ||
1. [Overview](overview) | ||
- [Highlights](#highlights) | ||
- [What can you get from this repo? [The Road Map]](#what-can-you-get-from-this-repo-the-road-map) | ||
|
@@ -129,50 +113,6 @@ For users can extract VC Features if they want, here we also provide the Updown | |
|
||
|
||
### How to use after download | ||
======= | ||
1. Overview | ||
2. Our VC Feature | ||
- Pretrained on COCO | ||
- Downstream Models (**To be update**) | ||
3. VC R-CNN Framework | ||
- **To be update** | ||
|
||
|
||
## Overview | ||
This projuct aims to build a visual commonsense representation learning framework based on the current [object detection codebase](https://github.com/facebookresearch/maskrcnn-benchmark) with un-/self-supervised learning. The **highlights** of our proposed method and feature are listed below: | ||
|
||
1. VC Feature: | ||
- **Effective**: Our visual commonsense representation encodes the``sense-making'' knowledge between object RoIs with causal intervention rather than just trivial correlation prediction. Compared to the previous widely used [Up-Down Feature](https://github.com/peteanderson80/bottom-up-attention), our VC can be regarded as an effective supplementary knowledge that models the interaction between objects for the downstream tasks. | ||
- **Easy to Use**: As we introduced in our paper, the VC Feature is extracted by providing the RoI boxes coordinates. Then the VC Feature can be **just concatenated** on the previous visual object features (e.g., Up-Down Feature) and ready to roll. | ||
- **Easy to Expand**: With a learned VC R-CNN framework, we can easily extract VC Features for any images and prepare them as an ``augmentation feature'' for the currently used representation conveniently. | ||
2. VC R-CNN | ||
- **Fast, Memory-efficient, Multi-GPU**: Our VC R-CNN framework is based on the well-known [maskrcnn-benchmark](https://github.com/facebookresearch/maskrcnn-benchmark) from facebook. Therefore, our VC R-CNN just inherit all its advantages. (It's pity that the [detectron2](https://github.com/facebookresearch/detectron2) had not been released when I am working on this project, however maskrcnn-benchmark can be a stable version.) | ||
- **Support customized dataset**: Users can easily adds COCO-style datasets to train VC R-CNN on other images. | ||
|
||
|
||
|
||
## VC Feature | ||
|
||
For easy-to-use, here we directly provide the pretrained VC Features on the entire **MSCOCO dataset** based on the [Up-Down](https://github.com/peteanderson80/bottom-up-attention) feature's boxes in the below links (This feature is a beta version and we would further fine-tune our VC Feature for better downstream performance). The features are stored in tar.gz format. | ||
|
||
|
||
|
||
#### 10 - 100 features per image (adaptive): | ||
|
||
- COCO 2014 Train/Val Image Features (123K / 6G)   [Google Drive](https://drive.google.com/file/d/1z36lR-CwLjwsJPPxE-phqZTMQCTb5KLV/view?usp=sharing)   [Baidu Drive (key:ec8x)](https://pan.baidu.com/s/1alOZkyGCJSso_znc2i2REA) | ||
- COCO 2014 Testing Image Features (41K / 2G)   [Google Drive](https://drive.google.com/file/d/1PQANKKRdD6j980SjokNTCXV5aQNGW4zS/view?usp=sharing)   [Baidu Drive (key:ec8x)](https://pan.baidu.com/s/1alOZkyGCJSso_znc2i2REA) | ||
- COCO 2015 Testing Image Features (81K / 4G)   [Google Drive](https://drive.google.com/file/d/1U9-EbQI8ZCFe7MvmJXI1xCDTbW2f-E98/view?usp=sharing)   [Baidu Drive (key:ec8x)](https://pan.baidu.com/s/1alOZkyGCJSso_znc2i2REA) | ||
|
||
**Ps**: For those who may have no access to the Up-Down feature, here we also provide the feature **after concatenation** and you can directly use without `numpy.concatenate` (The feature dimension is 3072 : 2048+1024): | ||
|
||
- [concat] COCO 2014 Train/Val Image Features (123K / 27G)   [Google Drive](https://drive.google.com/file/d/1kBnVvph5ISWWljOPeWCdFHWg7lkOp6QX/view?usp=sharing) | ||
- [concat] COCO 2014 Testing Image Features (41K / 9G)   [Google Drive](https://drive.google.com/file/d/1dSx4BeUJT1NOW6Fdlnmo_3HEbv7zrp1B/view?usp=sharing) | ||
- [concat] COCO 2015 Testing Image Features (81K / 17G)   [Google Drive](https://drive.google.com/file/d/1Sp8w8BTyiVMJjlSUJWvgFFTaH2AAayJQ/view?usp=sharing) | ||
|
||
|
||
|
||
#### How to use after download | ||
>>>>>>> 6b09a5f0bd76ca65b6530627fb7decb643d55b54 | ||
|
||
- Unzip the file with command: | ||
```bash | ||
|
@@ -182,24 +122,16 @@ tar -xzvf file_name | |
- The feature format (The shape of each numpy file is [n x 1024]): | ||
``` | ||
coco_trainval/test_year | ||
<<<<<<< HEAD | ||
|---image_id1.npy | ||
|---image_id2.npy | ||
... | ||
|---image_idN.npy | ||
======= | ||
|---image1.npy | ||
|---image2.npy | ||
... | ||
|---imageN.npy | ||
>>>>>>> 6b09a5f0bd76ca65b6530627fb7decb643d55b54 | ||
``` | ||
|
||
- Concatenate on the previous feature in the downstream task training. | ||
|
||
|
||
|
||
<<<<<<< HEAD | ||
### Downstream Vision & Language Tasks | ||
Please check [Downstream Tasks](downstream/README.md) for more details: | ||
|
||
|
@@ -382,14 +314,3 @@ I really appreciate [**Kaihua Tang**](https://kaihuatang.github.io/), **[Yulei N | |
|
||
|
||
If you have any questions or concerns, please kindly email to [**Tan Wang**](https://wangt-cn.github.io/). | ||
======= | ||
#### Tips for using in downstream tasks | ||
|
||
- We recommend users to **add the dimension** of the start multi-layers (embedding layer, fc and so on) in the downstream networks since the feature size add from 2048 to 3072 (for Up-Down Feature). | ||
- The learning rate can be **slighted reduced**. | ||
- We find the self-attentive operation on feature (e.g., the refining encoder in AoANet) may hurt the effectiveness of our VC Feature. Details can be kindly found at the bottom of Page 7 in our paper. | ||
|
||
|
||
|
||
If you have any questions or concerns, please kindly email to [Tan Wang]([email protected]). | ||
>>>>>>> 6b09a5f0bd76ca65b6530627fb7decb643d55b54 |