Stars
[TPAMI reviewing] Towards Visual Grounding: A Survey
[NeurIPS 2024 Best Paper][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ult…
A PyTorch implementation of the paper "Revisiting Non-Autoregressive Transformers for Efficient Image Synthesis"
Official Jax Implementation of MaskGIT
Taming Transformers for High-Resolution Image Synthesis
[ECCV 2024] AdaNAT: Exploring Adaptive Policy for Token-Based Image Generation
[IROS2024] Camera-Radar Fusion for BEV Map and Object Segmentation
[CVPR 2024] Tune-An-Ellipse: CLIP Has Potential to Find What You Want
[CVPR 2021] Official PyTorch implementation for Transformer Interpretability Beyond Attention Visualization, a novel method to visualize classifications by Transformer based networks.
[ICCV 2021- Oral] Official PyTorch implementation for Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers, a novel method to visualize any Transformer-…
PyTorch code and models for the DINOv2 self-supervised learning method.
Official implementation of OV-DINO: Unified Open-Vocabulary Detection with Language-Aware Selective Fusion
[CVPR 2024] Code for "Improved Visual Grounding through Self-Consistent Explanations".
[CVPR 2023] Code for "Improving Visual Grounding by Encouraging Consistent Gradient-based Explanations"
[CVPR 2023] DepGraph: Towards Any Structural Pruning
Code for ALBEF: a new vision-language pre-training method
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…
Finetuning DINOv2 (https://github.com/facebookresearch/dinov2) on your own dataset
Implementation of CoCa, Contrastive Captioners are Image-Text Foundation Models, in Pytorch
LAVIS - A One-stop Library for Language-Vision Intelligence
(ITSC 2021) Optimising the selection of samples for robust lidar camera calibration. This package estimates the calibration parameters from camera to lidar frame.
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (V…
Object Detection toolkit based on PaddlePaddle. It supports object detection, instance segmentation, multiple object tracking and real-time multi-person keypoint detection.
OpenMMLab FewShot Learning Toolbox and Benchmark