-
Shanghai Jiao Tong University
- Shanghai
Highlights
- Pro
Stars
Fully open reproduction of DeepSeek-R1
An elegant \LaTeX\ résumé template. 大陆镜像 https://gods.coding.net/p/resume/git
An Approach to Enhancing the Efficacy of Post-Training Using Synthetic Data by Iterative Data Selection
[NeurIPS 2024 Best Paper][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ult…
HunyuanVideo: A Systematic Framework For Large Video Generation Model
Emu Series: Generative Multimodal Models from BAAI
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
Official code of SmartEdit [CVPR-2024 Highlight]
High-Resolution Image Synthesis with Latent Diffusion Models
LLaVA-CoT, a visual language model capable of spontaneous, systematic reasoning
🚀 「大模型」3小时从0训练27M参数的视觉多模态VLM!🌏 Train a 27M-parameter VLM from scratch in just 3 hours!
Collection of AWESOME vision-language models for vision tasks
Qwen2.5-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
Official code repo for the O'Reilly Book - "Hands-On Large Language Models"
🚀🚀 「大模型」50分钟完全从0训练26M的小参数GPT!🌏 Train a 26M-parameter GPT from scratch in just 50 min!
Lists of resources useful for my PhD in computer vision
Famous Vision Language Models and Their Architectures
MLNLP社区用来帮助大家避免论文投稿小错误的整理仓库。 Paper Writing Tips
The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.
A comprehensive collection of IQA papers
Code for the ICML 2021 (long talk) paper: "ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision"
A series of large language models developed by Baichuan Intelligent Technology