Stars
A procedural Blender pipeline for photorealistic training image generation
A small Python module for downloading models from Sketchfab.
[NeurIPS 2024] Animate3D: Animating Any 3D Model with Multi-view Video Diffusion
Let your Claude able to think
Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.
Official code for PointOdyssey: A Large-Scale Synthetic Dataset for Long-Term Point Tracking (ICCV 2023)
DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos
Computer vision utils for Blender (generate instance annoatation, depth and 6D pose by one line code)
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
Accelerating the development of large multimodal models (LMMs) with one-click evaluation module - lmms-eval.
The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.
🔥🔥MLVU: Multi-task Long Video Understanding Benchmark
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
Official repo for Detecting, Explaining, and Mitigating Memorization in Diffusion Models (ICLR 2024)
Pytorch implementation of Transfusion, "Predict the Next Token and Diffuse Images with One Multi-Modal Model", from MetaAI
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
📖 A curated list of resources dedicated to hallucination of multimodal large language models (MLLM).
[CVPR 2024] On the Content Bias in Fréchet Video Distance
4D-fy: Text-to-4D Generation Using Hybrid Score Distillation Sampling
Implements VAR+CLIP for text-to-image (T2I) generation
Code base of SynthCLIP: CLIP training with purely synthetic text-image pairs from LLMs and TTIs.
[NeurIPS 2024 Oral][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-sim…
Open source implementation of "A Self-Supervised Descriptor for Image Copy Detection" (SSCD).