-
MGTV
- Changsha, China
- https://zheng222.github.io/
Stars
FastVideo is an open-source framework for accelerating large video diffusion model.
A vue-based project page template for academic papers. (in development) https://junyaohu.github.io/academic-project-page-template-vue
DimensionX: Create Any 3D and 4D Scenes from a Single Image with Controllable Video Diffusion
Official repository of Human3.6M 3D WholeBody (H3WB) dataset
[Arxiv 2024] MotionCLR: Motion Generation and Training-free Editing via Understanding Attention Mechanisms
[NeurIPS 2024] Unique3D: High-Quality and Efficient 3D Mesh Generation from a Single Image
The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling strong reasoning abilities, self-improvment, and skill curatio…
Grounded SAM 2: Ground and Track Anything in Videos with Grounding DINO, Florence-2 and SAM 2
text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)
cube studio开源云原生一站式机器学习/深度学习/大模型AI平台,支持sso登录,多租户,大数据平台对接,notebook在线开发,拖拉拽任务流pipeline编排,多机多卡分布式训练,超参搜索,推理服务VGPU,边缘计算,serverless,标注平台,自动化标注,数据集管理,大模型微调,vllm大模型推理,llmops,私有知识库,AI模型应用商店,支持模型一键开发/推理/微调,…
Official implementations for paper: Zero-shot Image Editing with Reference Imitation
SEED-Voken: A Series of Powerful Visual Tokenizers
✨✨Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models
[Arxiv-2024] MotionLLM: Understanding Human Behaviors from Human Motions and Videos
A generative speech model for daily dialogue.
[SIGGRAPH Asia 2024, Journal Track] ToonCrafter: Generative Cartoon Interpolation
GPT4V-level open-source multi-modal model based on Llama3-8B
MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone
(NeurIPS 2024 Oral 🔥) Improved Distribution Matching Distillation for Fast Image Synthesis
[NeurIPS 2024] SimPO: Simple Preference Optimization with a Reference-Free Reward
[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.
Official implementation of HawkEye: Training Video-Text LLMs for Grounding Text in Videos