You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
## Codes in refactoring and testing, finetuning part coming soon.
6
6
7
7
8
8
## Requirements
9
-
> This is my experimental enviroment.
10
-
PyTorch 1.3.0
11
-
python 3.7.4
9
+
> This is my experimental enviroment.
10
+
11
+
- PyTorch 1.3.0
12
+
- python 3.7.4
13
+
- accimage
12
14
13
15
## Inter-intra contrastive framework
14
16
For samples, we have
15
-
-[ ] Inter-positives: samples with same labels, not used for self-supervised learning;
16
-
-[x] Inter-negatives: different samples, or samples with different indexes;
17
-
-[x] Intra-positives: data from the same sample, in different views / from different augmentations;
18
-
-[x] Intra-negatives: data from the same sample while some kind of information has been broken down. In video case, temporal information has been destoried.
17
+
-[ ] Inter-positives: samples with **same labels**, not used for self-supervised learning;
18
+
-[x] Inter-negatives: **different samples**, or samples with different indexes;
19
+
-[x] Intra-positives: data from the **same sample**, in different views / from different augmentations;
20
+
-[x] Intra-negatives: data from the **same sample** while some kind of information has been broken down. In video case, temporal information has been destoried.
21
+
22
+
Our work makes use of all usable parts (in this classification category) to form an inter-intra contrastive framework. The experiments here are mainly based on Contrastive Multiview Coding.
19
23
20
-
Our work makes use of all usable parts (in this classification category) to form an inter-intra contrastive framework. The experiments here are mainly based on Contrastive Multiview Coding. It is flexible to extend this framework to other contrastive learning methods such as MoCo and SimCLR.
24
+
It is flexible to extend this framework to other contrastive learning methods which use negative samples, such as MoCo and SimCLR.
### Make the most of data for contrastive learning.
@@ -39,22 +45,40 @@ I highly recommend the pre-comupeted optical flow images and resized RGB frames
39
45
```
40
46
python train_ssl.py --dataset=ucf101
41
47
```
42
-
The default setting uses frame repeating as intra-negative samples for videos. R3D is used by default. You can use `--model` to try different models.
43
48
44
-
We use two views in our experiments. View #1 is a RGB video clip, View #2 can be RGB/Res/Optical flow video vlip. Residual video clips are default modality for View # 2. You can use `--modality` to try other modalities. Intra-negative samples are generated from View #1.
This default setting uses frame repeating as intra-negative samples for videos. R3D is used.
56
+
57
+
We use two views in our experiments. View #1 is a RGB video clip, View #2 can be RGB/Res/Optical flow video clip. Residual video clips are default modality for View #2. You can use `--modality` to try other modalities. Intra-negative samples are generated from View #1.
45
58
46
-
In this part, it may be wired to use only one optical flow channel *u* or *v*. We use only one channel to make it possible for **only one model** to handle inputs from different modalities. It is also an optional setting that using different models to handle each modality.
59
+
It may be wired to use only one optical flow channel *u* or *v*. We use only one channel to make it possible for **only one model** to handle inputs from different modalities. It is also an optional setting that using different models to handle each modality.
Only one model is used for different views. You can set `--modality` to decide which modality to use. When setting `--merge=True`, RGB for View #1 and the specific modality for View #2 will be jointly tested.
65
+
One model is used to handle different views/modalities. You can set `--modality` to decide which modality to use. When setting `--merge=True`, RGB for View #1 and the specific modality for View #2 will be jointly used for joint retrieval.
66
+
67
+
By default training setting, it is very easy to get over 30%@top1 for video retrieval in ucf101 and around 13%@top1 in hmdb51 without joint retrieval.
In this way, only testing is conducted using the given model.
80
+
81
+
The accuracies using residual clips are not stable for validation set (this may also caused by), the final testing part will use the best model on validation set.
58
82
59
83
## Results
60
84
### Retrieval results
@@ -106,6 +130,9 @@ x = ((shift_x -x) + 1)/2
106
130
```
107
131
Which is slightly different from that in papers.
108
132
133
+
We also reimplement VCP in this [repo](https://github.com/BestJuly/VCP). By simply using residual cliops, significant improvements can be obtained for both video retrieval and video recognition.
134
+
135
+
109
136
## Citation
110
137
If you find our work helpful for your research, please consider citing the paper
111
138
```
@@ -135,4 +162,4 @@ If you find the residual input helpful for video-related tasks, please consider
135
162
```
136
163
137
164
## Acknowledgements
138
-
Part of this code is inspired by [CMC](https://github.com/HobbitLong/CMC) and [VCOP](https://github.com/xudejing/video-clip-order-prediction)
165
+
Part of this code is inspired by [CMC](https://github.com/HobbitLong/CMC) and [VCOP](https://github.com/xudejing/video-clip-order-prediction).
0 commit comments