cv
Curated list of papers and resources focused on 3D Gaussian Splatting, intended to keep pace with the anticipated surge of research in the coming months.
[CVPR 2024] Official repo for "InteractDiffusion: Interaction-Control for Text-to-Image Diffusion Model".
Official Pytorch implementation of StreamV2V.
[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.
3D Gaussian Splatting: Survey, Technologies, Challenges, and Opportunities
HART: Efficient Visual Generation with Hybrid Autoregressive Transformer
ElasticTok: Adaptive Tokenization for Image and Video
[ICCV 2023] ALIP: Adaptive Language-Image Pre-training with Synthetic Caption
LLaVA-CoT, a visual language model capable of spontaneous, systematic reasoning
ICCV2021, Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet
[CVPR 2024] Official PyTorch Code for "PromptKD: Unsupervised Prompt Distillation for Vision-Language Models"
A curated list of awesome prompt/adapter learning methods for vision-language models like CLIP.
Here is the official repository of WF-Diff reproductions.
[AAAI' 25] U-KAN Makes Strong Backbone for Medical Image Segmentation and Generation
[CVPR 2025] SegEarth-OV: Towards Training-Free Open-Vocabulary Segmentation for Remote Sensing Images
Code & Dataset repository for the paper "Towards Open-Vocabulary Remote Sensing Image Semantic Segmentation"