Stars
Official code of paper "DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot Execution"
Janus-Series: Unified Multimodal Understanding and Generation Models
Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.
A generative world for general-purpose robotics & embodied AI learning.
Official repository of Uni-AdaFocus (TPAMI 2024).
CALVIN - A benchmark for Language-Conditioned Policy Learning for Long-Horizon Robot Manipulation Tasks
Official repository of InLine attention (NeurIPS 2024)
[NeurIPS 2024] ENAT: Rethinking Spatial-temporal Interactions in Token-based Image Synthesis
[ECCV 2024] Efficient Diffusion Transformer with Step-wise Dynamic Attention Mediators
[ECCV 2024] AdaNAT: Exploring Adaptive Policy for Token-Based Image Generation
A single repo with all scripts and utils to train / fine-tune the Mamba model with or without FIM
A PyTorch implementation of the paper "Revisiting Non-Autoregressive Transformers for Efficient Image Synthesis"
[TPAMI 2024] Probabilistic Contrastive Learning for Long-Tailed Visual Recognition
[ICML 2024] SimPro: A Simple Probabilistic Framework Towards Realistic Long-Tailed Semi-Supervised Learning
[CVPR2024] GSVA: Generalized Segmentation via Multimodal Large Language Models
[IEEE TIP] Fine-grained Recognition with Learnable Semantic Data Augmentation
Smooth Diffusion: Crafting Smooth Latent Spaces in Diffusion Models arXiv 2023 / CVPR 2024
Official repository of Agent Attention (ECCV2024)
Open-Source Reproduction/Demo of the LLM Riddles Game
Repository of "Train Once, Get a Family: State-Adaptive Balances for Offline-to-Online Reinforcement Learning" (NeurIPS 2023 Spotlight)
[NeurIPS 2023] Rank-DETR for High Quality Object Detection
Jittor implementation of Vision Transformer with Deformable Attention
Official implementation of A Mixture of Surprises for Unsupervised Reinforcement Learning
[ECCV 2022] Learning to Weight Samples for Dynamic Early-exiting Networks