Stars
Implementation of the proposed minGRU in Pytorch
Official Implementation of LOTUS: Diffusion-based Visual Foundation Model for High-quality Dense Prediction
Use PEFT or Full-parameter to finetune 400+ LLMs (Qwen2.5, Llama3.2, GLM4, Internlm2.5, Yi1.5, Mistral, Baichuan2, DeepSeek, ...) or 150+ MLLMs (Qwen2-VL, Qwen2-Audio, Llama3.2-Vision, Llava, Inter…
Qwen2.5 is the large language model series developed by Qwen team, Alibaba Cloud.
real time face swap and one-click video deepfake with only a single image
The only guide you need to learn everything about GMM
Unofficial implementation of PatchCore anomaly detection
DashInfer is a native LLM inference engine aiming to deliver industry-leading performance atop various hardware architectures, including CUDA, x86 and ARMv9.
GLM-4 series: Open Multilingual Multimodal Chat LMs | 开源多语言多模态对话模型
[SIGGRAPH Asia 2024, Journal Track] ToonCrafter: Generative Cartoon Interpolation
GPT4V-level open-source multi-modal model based on Llama3-8B
An efficient pure-PyTorch implementation of Kolmogorov-Arnold Network (KAN).
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
Code for paper "New Benchmarks for Barcode Detection using both Synthetic and Real Data" https://link.springer.com/chapter/10.1007%2F978-3-030-57058-3_34
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
[ECCV2024] API code for T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy
PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis
The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
Implementation of paper - YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information
Annotated version of the Mamba paper
Latte: Latent Diffusion Transformer for Video Generation.
Implement a ChatGPT-like LLM in PyTorch from scratch, step by step
Structured state space sequence models
Penpot: The open-source design tool for design and code collaboration
Instant voice cloning by MIT and MyShell.
Official repository of Agent Attention (ECCV2024)