Stars
中文nlp解决方案(大模型、数据、模型、训练、推理)
Video+code lecture on building nanoGPT from scratch
The simplest, fastest repository for training/finetuning medium-sized GPTs.
iLLaVA: An Image is Worth Fewer Than 1/3 Input Tokens in Large Multimodal Models
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.
AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
Making LLaVA Tiny via MoE-Knowledge Distillation
The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
[CVPR2024] GSVA: Generalized Segmentation via Multimodal Large Language Models
OMG-LLaVA and OMG-Seg codebase [CVPR-24 and NeurIPS-24]
[ECCV2024] This is an official implementation for "PSALM: Pixelwise SegmentAtion with Large Multi-Modal Model"
LLM-Seg: Bridging Image Segmentation and Large Language Model Reasoning
Project Page for "LISA: Reasoning Segmentation via Large Language Model"
Contextual Object Detection with Multimodal Large Language Models
A minimal codebase for finetuning large multimodal models, supporting llava-1.5/1.6, llava-interleave, llava-next-video, llava-onevision, llama-3.2-vision, qwen-vl, qwen2-vl, phi3-v etc.
[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
A model compression and acceleration toolbox based on pytorch.
A curated reading list of research in Mixture-of-Experts(MoE).