[NeurIPS 2024 Best Paper][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ult…

Jupyter Notebook 6,500 427 Updated Jan 12, 2025

Tencent / HunyuanVideo

HunyuanVideo: A Systematic Framework For Large Video Generation Model

Python 8,129 653 Updated Jan 24, 2025

Stability-AI / sd3.5

Python 949 67 Updated Jan 8, 2025

baaivision / Emu

Emu Series: Generative Multimodal Models from BAAI

Python 1,676 86 Updated Sep 27, 2024

timothybrooks / instruct-pix2pix

Python 6,496 544 Updated Mar 3, 2024

Open-Source-O1 / Open-O1

Python 1,160 41 Updated Nov 21, 2024

cambrian-mllm / cambrian

Cambrian-1 is a family of multimodal LLMs with a vision-centric design.

Python 1,838 126 Updated Oct 30, 2024

TencentARC / SmartEdit

Official code of SmartEdit [CVPR-2024 Highlight]

Python 285 9 Updated Jun 21, 2024

Stability-AI / stablediffusion

High-Resolution Image Synthesis with Latent Diffusion Models

Python 39,978 5,131 Updated Oct 10, 2024

PKU-YuanGroup / LLaVA-CoT

LLaVA-CoT, a visual language model capable of spontaneous, systematic reasoning

Python 1,792 64 Updated Jan 22, 2025

jingyaogong / minimind-v

🚀 「大模型」3小时从0训练27M参数的视觉多模态VLM！🌏 Train a 27M-parameter VLM from scratch in just 3 hours!

Python 884 87 Updated Dec 13, 2024

jingyi0000 / VLM_survey

Collection of AWESOME vision-language models for vision tasks

2,476 195 Updated Dec 3, 2024

QwenLM / Qwen2.5-VL

Qwen2.5-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Jupyter Notebook 6,738 488 Updated Feb 7, 2025

HandsOnLLM / Hands-On-Large-Language-Models

Official code repo for the O'Reilly Book - "Hands-On Large Language Models"

Jupyter Notebook 4,401 1,011 Updated Feb 5, 2025

jingyaogong / minimind

🚀🚀 「大模型」50分钟完全从0训练26M的小参数GPT！🌏 Train a 26M-parameter GPT from scratch in just 50 min!

Python 7,818 802 Updated Dec 13, 2024

hassony2 / useful-computer-vision-phd-resources

Lists of resources useful for my PhD in computer vision

556 102 Updated Jan 16, 2022

pengsida / learning_research

本人的科研经验

6,281 375 Updated Feb 6, 2025

gokayfem / awesome-vlm-architectures

Famous Vision Language Models and Their Architectures

Markdown 608 31 Updated Sep 8, 2024

MLNLP-World / Paper-Writing-Tips

MLNLP社区用来帮助大家避免论文投稿小错误的整理仓库。 Paper Writing Tips

3,745 478 Updated May 29, 2022

QwenLM / Qwen2-Audio

The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.

Python 1,460 113 Updated Aug 13, 2024

idealo / imagededup

😎 Finding duplicate images made easy!

Python 5,239 464 Updated Dec 19, 2024

chaofengc / Awesome-Image-Quality-Assessment

A comprehensive collection of IQA papers

TeX 1,094 73 Updated Jan 5, 2025

dandelin / ViLT

Code for the ICML 2021 (long talk) paper: "ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision"

Python 1,426 211 Updated Apr 3, 2024

baichuan-inc / Baichuan2

A series of large language models developed by Baichuan Intelligent Technology

Python 4,124 298 Updated Nov 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Xiujie Song xiujiesong

Achievements