Stars
AnchorAttention: Improved attention for LLMs long-context training
Building a comprehensive and handy list of papers for GUI agents
Official Implementation of "Fine-Tuning is Fine, if Calibrated.", NeurIPS 2024
Lessons Learned from a Unifying Empirical Study of Parameter-Efficient Transfer Learning (PETL) in Visual Recognition
Computer vision benchmark for evolutionary biology-related tasks.
This is the repository for the BioCLIP model and the TreeOfLife-10M dataset [CVPR'24 Oral, Best Student Paper].
EAGLE: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders
Data and code for "Fake News in Sheep's Clothing: Robust Fake News Detection Against LLM-Empowered Style Attacks" (KDD 2024)
[CVPR-2024] Decoupling Static and Hierarchical Motion Perception for Referring Video Segmentation
[ECCV 2024] Isomorphic Pruning for Vision Models
Codes and data for the NAACL 2024 Findings paper: Getting Sick After Seeing a Doctor? Diagnosing and Mitigating Knowledge Conflicts in Event Temporal Reasoning
ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment
Group Distributionally Robust Dataset Distillation with Risk Minimization
[CVPR 2024] On the Diversity and Realism of Distilled Dataset: An Efficient Dataset Distillation Paradigm
(NeurIPS 2023 spotlight) Large-scale Dataset Distillation/Condensation, 50 IPC (Images Per Class) achieves the highest 60.8% on original ImageNet-1K val set.
[CVPR2024] Efficient Dataset Distillation via Minimax Diffusion
[CVPR 2024] Exploring Orthogonality in Open World Object Detection
Smooth Diffusion: Crafting Smooth Latent Spaces in Diffusion Models arXiv 2023 / CVPR 2024
Code for our papers : "Generating images of rare concepts using pre-trained diffusion models" (AAAI 24) and "Norm-guided latent space exploration for text-to-image generation" (Neurips 23)
Zhejiang University Graduation Thesis LaTeX Template
Official implementations for paper: Anydoor: zero-shot object-level image customization
Official implementations for paper: LivePhoto: Real Image Animation with Text-guided Motion Control
[CVPR 2024] DeepCache: Accelerating Diffusion Models for Free
On the Road with GPT-4V(ision): Explorations of Utilizing Visual-Language Model as Autonomous Driving Agent
Avalanche: an End-to-End Library for Continual Learning based on PyTorch.
List of recent advances for human avatars, including generation, reconstruction, and editing, etc.
PyTorch implementation of InstructDiffusion, a unifying and generic framework for aligning computer vision tasks with human instructions.
A family of open-sourced Mixture-of-Experts (MoE) Large Language Models