Starred repositories
[ECCV 2024] Official code implementation of Vary: Scaling Up the Vision Vocabulary of Large Vision Language Models.
Code and models for the paper "One Transformer Fits All Distributions in Multi-Modal Diffusion"
A series of large language models trained from scratch by developers @01-ai
Official PyTorch implementation of "Multi-modal Queried Object Detection in the Wild" (accepted by NeurIPS 2023)
[CSUR] A Survey on Video Diffusion Models
Retrieval and Retrieval-augmented LLMs
为GPT/GLM等LLM大语言模型提供实用化交互接口,特别优化论文阅读/润色/写作体验,模块化设计,支持自定义快捷按钮&函数插件,支持Python和C++等项目剖析&自译解功能,PDF/LaTex论文翻译&总结功能,支持并行问询多种LLM模型,支持chatglm3等本地模型。接入通义千问, deepseekcoder, 讯飞星火, 文心一言, llama2, rwkv, claude2, m…
An open source implementation of CLIP.
Easily compute clip embeddings and build a clip retrieval system with them
A linear estimator on top of clip to predict the aesthetic quality of pictures
You can do anything by sota AI with prompt ,auto AI tools , VL larger model fine and project
[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.
🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
Robust Video Matting in PyTorch, TensorFlow, TensorFlow.js, ONNX, CoreML!
OpenLMLab / OpenChineseLLaMA
Forked from meta-llama/llamaChinese large language model base generated through incremental pre-training on Chinese datasets
ImageBind One Embedding Space to Bind Them All
InternGPT (iGPT) is an open source demo platform where you can easily showcase your AI models. Now it supports DragGAN, ChatGPT, ImageBind, multimodal chat like GPT-4, SAM, interactive image editin…
Official Code for DragGAN (SIGGRAPH 2023)
Make-A-Protagonist: Generic Video Editing with An Ensemble of Experts
Chinese and English multimodal conversational language model | 多模态中英双语对话语言模型
中文LLaMA&Alpaca大语言模型+本地CPU/GPU训练部署 (Chinese LLaMA & Alpaca LLMs)
🦜🔗 Build context-aware reasoning applications
Open-sourced codes for MiniGPT-4 and MiniGPT-v2 (https://minigpt-4.github.io, https://minigpt-v2.github.io/)
GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use.
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
BELLE: Be Everyone's Large Language model Engine(开源中文对话大模型)