Stars
Official PyTorch implementation of the IEEE TETCI 2024 paper LoCATe-GAT
A simple PyTorch implementation of CLIP model using DinoV2 and BERT
[CVPR 2024] Official implementation of the paper "Visual In-context Learning"
[ECCV 2024] Official implementation of the paper "Semantic-SAM: Segment and Recognize Anything at Any Granularity"
Official Implementation of CVPR24 highligt paper: Matching Anything by Segmenting Anything
This repository contains the official implementation of the research paper, "MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training" CVPR 2024
Official repository for "AM-RADIO: Reduce All Domains Into One"
Implementation of paper - YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information
Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
EVA Series: Visual Representation Fantasies from BAAI
[ICCV 2023] DETRs with Collaborative Hybrid Assignments Training
[CVPR 2024] Official implementation of "ViTamin: Designing Scalable Vision Models in the Vision-language Era"
The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
[CVPR2024] Official Pytorch Implementation of SED: A Simple Encoder-Decoder for Open-Vocabulary Semantic Segmentation.
[CVPR 2024] Official implement of <Stronger, Fewer, & Superior: Harnessing Vision Foundation Models for Domain Generalized Semantic Segmentation>
[NeurIPS 2023] Official implementation of the paper "Segment Everything Everywhere All at Once"
EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything
[ECCV2024 Oral🔥] Official Implementation of "GiT: Towards Generalist Vision Transformer through Universal Language Interface"
Efficient vision foundation models for high-resolution generation and perception.
[CVPR 2024] Real-Time Open-Vocabulary Object Detection
crnn chinese_plate_recognition
Official implementation of the paper GEFF: Improving Any Clothes-Changing Person ReID Model using Gallery Enrichment with Face Features.
A Semantic Controllable Self-Supervised Learning Framework to learn general human representations from massive unlabeled human images, which can benefit downstream human-centric tasks to the maximu…
[ECCV 2024] The official code of paper "Open-Vocabulary SAM".
A general representation model across vision, audio, language modalities. Paper: ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities