Skip to content

Commit

Permalink
Merge pull request #119 from facebookresearch/av_dev
Browse files Browse the repository at this point in the history
Looks great!
  • Loading branch information
dutran authored Jul 18, 2020
2 parents 7a96918 + 1eef973 commit da212a3
Show file tree
Hide file tree
Showing 21 changed files with 4,182 additions and 37 deletions.
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,11 +9,13 @@ Currently, this codebase supports the following models:
+ R(2+1)D, MCx models [[1]](https://research.fb.com/wp-content/uploads/2018/04/a-closer-look-at-spatiotemporal-convolutions-for-action-recognition.pdf).
+ CSN models [[2]](https://arxiv.org/pdf/1904.02811.pdf) (**note:pytorch implementation is buggy**).
+ R(2+1)D and CSN models pre-trained on large-scale (65 million!) weakly-supervised public Instagram videos (**IG-65M**) [[3]](https://research.fb.com/wp-content/uploads/2019/05/Large-scale-weakly-supervised-pre-training-for-video-action-recognition.pdf).
+ Gradient-Blending for audio-visual modeling [[4]](https://arxiv.org/pdf/1905.12681.pdf) (Caffe2 Only)

## References
1. D. Tran, H. Wang, L. Torresani, J. Ray, Y. LeCun and M. Paluri. **A Closer Look at Spatiotemporal Convolutions for Action Recognition.** CVPR 2018.
2. D. Tran, H. Wang, L. Torresani and M. Feiszli. **Video Classification with Channel-Separated Convolutional Networks.** ICCV 2019.
3. D. Ghadiyaram, M. Feiszli, D. Tran, X. Yan, H. Wang and D. Mahajan, **Large-scale weakly-supervised pre-training for video action recognition.** CVPR 2019.
4. W. Wang, D. Tran, M. Feiszli, **What Makes Training Multi-Modal Classification Networks Hard?** CVPR 2020.


## Suporting Team
Expand Down
Loading

0 comments on commit da212a3

Please sign in to comment.