Stars
Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation
PyTorch code for Vision Transformers training with the Self-Supervised learning method DINO
Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch
Implementation of DALL-E 2, OpenAI's updated text-to-image synthesis neural network, in Pytorch
PyTorch code and models for the DINOv2 self-supervised learning method.
Python Implementation for Linde-Buzo-Gray / Generalized Lloyd Algorithm for vector quantization.
Official codebase used to develop Vision Transformer, SigLIP, MLP-Mixer, LiT and more.
Official code for "ControlAR: Controllable Image Generation with Autoregressive Models"
[CVPR 2024] MFP: Making Full Use of Probability Maps for Interactive Image Segmentation
[CVPR 2023] Official repository of Generative Semantic Segmentation
🔥stable, simple, state-of-the-art VQVAE toolkit & cookbook
[NeurIPS 2024 Best Paper][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ult…
[ICLR'23 Spotlight🔥] The first successful BERT/MAE-style pretraining on any convolutional network; Pytorch impl. of "Designing BERT for Convolutional Networks: Sparse and Hierarchical Masked Modeling"
Advanced AI Explainability for computer vision. Support for CNNs, Vision Transformers, Classification, Object detection, Segmentation, Image similarity and more.
A Framework for Systematic Validation of Uncertainty Estimation in Semantic Segmentation
SegRefiner: Towards Model-Agnostic Segmentation Refinement with Discrete Diffusion Process
[CAAI AIR'24] Bilateral Reference for High-Resolution Dichotomous Image Segmentation
A curated list of awesome resources for camouflaged/concealed object detection (COD).
FreeU: Free Lunch in Diffusion U-Net (CVPR2024 Oral)
Implementations of recent research prototypes/demonstrations using MONAI.
A Collection of Papers and Codes for CVPR2024/CVPR2021/CVPR2020 Low Level Vision
The official repository for "One Model to Rule them All: Towards Universal Segmentation for Medical Images with Text Prompts"
BiomedParse: A Foundation Model for Joint Segmentation, Detection, and Recognition of Biomedical Objects Across Nine Modalities
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
Official code of "EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model"