Skip to content

Deep Learning Project : Combining Self-Supervised Objective Functions in Computer Vision

Notifications You must be signed in to change notification settings

leot13/DeepLearningProject

This branch is 2 commits ahead of, 33 commits behind ArthurZucker/DeepLearningProject:main.

Folders and files

NameName
Last commit message
Last commit date

Latest commit

582760d · Jan 8, 2022
Jan 8, 2022
Dec 14, 2021
Jan 8, 2022
Jan 8, 2022
Jan 2, 2022
Jan 8, 2022
Jan 8, 2022
Dec 14, 2021
Jan 8, 2022
Dec 14, 2021
Dec 14, 2021
Dec 14, 2021
Jan 8, 2022
Dec 14, 2021
Jan 6, 2022
Jan 8, 2022
Dec 14, 2021

Repository files navigation

Project Description

In recent years, deep learning with self-supervision for images has gained a lot of traction in the vision community. As self-supervised models' performances keep getting closer to their supervised counterparts, 2 recent papers have stood out to us. First, DINO has set a new SOTA for self-supervised models on ImageNet and shown how Vision Transfomers learn to pay attention to important elements in Self-Supervised settings. Second, although not beating the SOTA, Barlow Twins proved that Self-Supervised models can naturally avoid collapse by using a cross correlation matrix as the loss function, and enforce its convergence towards the identity matrix.
Our primary goal is to combine ideas from DINO and Barlow Twins to design a new self-supervised architecture featuring both a cross entropy loss and a loss based on a cross correlation matrix. As a secondary task, we will attempt to leverage the stability induced by the Barlow Twins' loss to discard some of the hyperparameters used in the DINO architecture.

Barlow Twins Architecture

DINO Architecture

DINO Twins Architecture

Primary Results

We first trained the models on CIFAR-10 in a self-supervised setting. Then evaluated them on the same dataset by freezing the weights of the backbone and applying a trainable linear layer on top for classification. The Dino Resnet-50 and Dino-Twins Resnet-50 models were trained with the same hyperparameters.
Very few experiments have been performed with the ViT so far, and the batch size used for the ViT is 128 compared to 256 for the Resnet-50 based models. This is due to GPU constraints.
Our work is still in progress, so the results are subject to change. Particularly those of the DINO model, which are far from what would be expected.

Model CIFAR-10 Accuracy
Barlow Twins Resnet-50 80.3%
Dino Resnet-50 48.7%
Dino-Twins Resnet-50 85.3%
Dino-Twins ViT-T/4 78.2%

TO DO

  • Explain structure of the repository
  • Include image examples from wandb
  • Clean up the code / pylint
  • Check for dataleaks

Issue :

Contributions are not updated for co authored commits, this should be fixed

About

Deep Learning Project : Combining Self-Supervised Objective Functions in Computer Vision

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%