Stars
[ICLR 2025] "Noisy Test-Time Adaptation in Vision-Language Models"
Official Repo for Paper ‘’HealthGPT : A Medical Large Vision-Language Model for Unifying Comprehension and Generation via Heterogeneous Knowledge Adaptation‘’
Advances in Multimodal Adaptation and Generalization: From Traditional Approaches to Foundation Models
[ECCV 2024] - Improving Zero-shot Generalization of Learned Prompts via Unsupervised Knowledge Distillation
Grounding DINO 1.5: IDEA Research's Most Capable Open-World Object Detection Model Series
Grounded SAM 2: Ground and Track Anything in Videos with Grounding DINO, Florence-2 and SAM 2
Official repository for VisionZip (CVPR 2025)
面向开发者的 LLM 入门教程,吴恩达大模型系列课程中文版
Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
[ECCV 2024] Official code for "Pseudo-RIS: Distinctive Pseudo-supervision Generation for Referring Image Segmentation"
Official PyTorch and Diffusers Implementation of "LinFusion: 1 GPU, 1 Minute, 16K Image"
[CVPR 2023] Official code for "Zero-shot Referring Image Segmentation with Global-Local Context Features"
[CVPR 2023] Parameter is Not All You Need: Starting from Non-Parametric Networks for 3D Point Cloud Analysis
SGLang is a fast serving framework for large language models and vision language models.
The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery 🧑🔬
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…
A More Fair and Comprehensive Comparison between KAN and MLP
[ECCV 2024] The official code of paper "Open-Vocabulary SAM".
Block Transformer: Global-to-Local Language Modeling for Fast Inference (NeurIPS 2024)
Collection of awesome parameter-efficient fine-tuning resources.
A collection of parameter-efficient transfer learning papers focusing on computer vision and multimodal domains.
A curated list of awesome vision and language resources (still under construction... stay tuned!)
Collection of AWESOME vision-language models for vision tasks