- Beijing
- https://www.chunyuwang.org/
Stars
HunyuanVideo: A Systematic Framework For Large Video Generation Model
Generalist and Lightweight Model for Named Entity Recognition (Extract any entity types from texts) @ NAACL 2024
[IJCV-2021] FairMOT: On the Fairness of Detection and Re-Identification in Multi-Object Tracking
Mini-DALLE3: Interactive Text to Image by Prompting Large Language Models
[CVPR 2024 - Oral, Best Paper Award Candidate] Marigold: Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation
Latent-based SR using MoE and frequency augmented VAE decoder
Generative Representational Instruction Tuning
Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image Synthesis
Learning from synthetic data - code and models
ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment
Holistic Evaluation of Language Models (HELM), a framework to increase the transparency of language models (https://arxiv.org/abs/2211.09110). This framework is also used to evaluate text-to-image …
A large-scale text-to-image prompt gallery dataset based on Stable Diffusion
CLAY: A Controllable Large-scale Generative Model for Creating High-quality 3D Assets
Davidsonian Scene Graph (DSG) for Text-to-Image Evaluation (ICLR 2024)
TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation with Question Answering
AutoStudio: Crafting Consistent Subjects in Multi-turn Interactive Image Generation
Data-Efficient Multimodal Fusion on a Single GPU
Lumina-T2X is a unified framework for Text to Any Modality Generation
[NeurIPS 2024] Unique3D: High-Quality and Efficient 3D Mesh Generation from a Single Image
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
GPT4V-level open-source multi-modal model based on Llama3-8B
From anything to mesh like human artists. Official impl. of "MeshAnything: Artist-Created Mesh Generation with Autoregressive Transformers"
🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
Code for 'LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders'
The official repo for the paper "VeCLIP: Improving CLIP Training via Visual-enriched Captions"
Densely Captioned Images (DCI) dataset repository.
Data release for the ImageInWords (IIW) paper.