Stars
[CVPR2022] Geometric Transformer for Fast and Robust Point Cloud Registration
[Arxiv 2024] MotionCLR: Motion Generation and Training-free Editing via Understanding Attention Mechanisms
Official PyTorch repo for JoJoGAN: One Shot Face Stylization
Official Pytorch implementation for 2021 ICCV paper "Learning Motion Priors for 4D Human Body Capture in 3D Scenes" and trained models / data
[CVPR 2024] Real-Time Open-Vocabulary Object Detection
Grounding DINO 1.5: IDEA Research's Most Capable Open-World Object Detection Model Series
[CVPR2024] Official Pytorch Implementation of SED: A Simple Encoder-Decoder for Open-Vocabulary Semantic Segmentation.
Official implementation for "iTransformer: Inverted Transformers Are Effective for Time Series Forecasting" (ICLR 2024 Spotlight), https://openreview.net/forum?id=JePfAI8fah
Official implementation for "TimeXer: Empowering Transformers for Time Series Forecasting with Exogenous Variables" (NeurIPS 2024)
This is a repository that implements the Dense NN Retrieval Evaluation used for evaluating the In-Context Learning Capabilities of Vision Encoders.
Library implementation of "No Train, all Gain: Self-Supervised Gradients Improve Deep Frozen Representations"
Official PyTorch Implementation of "The Hidden Attention of Mamba Models"
Official Open Source code for "Masked Autoencoders As Spatiotemporal Learners"
An open source implementation of CLIP.
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
GPU Accelerated t-SNE for CUDA with Python bindings
Strong and Open Vision Language Assistant for Mobile Devices
This repository contains the official implementation of the research paper, "MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training" CVPR 2024
[ECCV 2024] Official PyTorch implementation of the paper "Scene-aware Human Motion Forecasting via Mutual Distance Prediction"
code for "Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion"
[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
[ICCV 2023] DETRs with Collaborative Hybrid Assignments Training
Real-time and accurate open-vocabulary end-to-end object detection
Grounded SAM 2: Ground and Track Anything in Videos with Grounding DINO, Florence-2 and SAM 2
High-resolution models for human tasks.
Code for "Fast and Robust Multi-Person 3D Pose Estimation from Multiple Views" (CVPR 2019, T-PAMI 2021)