Stars
[NeurIPS2024] - SimVG: A Simple Framework for Visual Grounding with Decoupled Multi-modal Fusion
[TPAMI reviewing] Towards Visual Grounding: A Survey
DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding
GRiT: A Generative Region-to-text Transformer for Object Understanding (https://arxiv.org/abs/2212.00280)
Evaluation code for Ref-L4, a new REC benchmark in the LMM era
[NeurIPS 2023] Official implementation of the paper "Segment Everything Everywhere All at Once"
[CVPR 2023] Official implementation of the paper "Mask DINO: Towards A Unified Transformer-based Framework for Object Detection and Segmentation"
[ECCV 2024] official code for "Long-CLIP: Unlocking the Long-Text Capability of CLIP"
A General-purpose Person Re-identification Task with Instructions
Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNs (CVPR 2022)
Multimodal chatbot with computer vision capabilities integrated
Official implementation of OV-DINO: Unified Open-Vocabulary Detection with Language-Aware Selective Fusion
This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows".
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
A detection/segmentation dataset with labels characterized by intricate and flexible expressions. "Described Object Detection: Liberating Object Detection with Flexible Expressions" (NeurIPS 2023).
Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.
这是各个主干网络分类模型的源码,可以用于训练自己的分类模型。
The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (V…
A Python toolkit for the OmniLabel benchmark providing code for evaluation and visualization
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
A general representation model across vision, audio, language modalities. Paper: ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
[ICCV 2023] DETRs with Collaborative Hybrid Assignments Training
COCO API - Dataset @ http://cocodataset.org/
EVA Series: Visual Representation Fantasies from BAAI