Stars
Qwen2.5 is the large language model series developed by Qwen team, Alibaba Cloud.
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
Ola: Pushing the Frontiers of Omni-Modal Language Model
Wan: Open and Advanced Large-Scale Video Generative Models
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
Large Language Model Text Generation Inference
A high-throughput and memory-efficient inference and serving engine for LLMs
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
[CVPR 2024] Official implementation of the paper "Visual In-context Learning"
[ECCV 2024] Official implementation of the paper "Semantic-SAM: Segment and Recognize Anything at Any Granularity"
Images to inference with no labeling (use foundation models to train supervised models).
Segment Anything in High Quality [NeurIPS 2023]
ModelScope: bring the notion of Model-as-a-Service to life.
Self-Supervised Speech Pre-training and Representation Learning Toolkit
The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.
The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.
A modular graph-based Retrieval-Augmented Generation (RAG) system
Unified-Modal Speech-Text Pre-Training for Spoken Language Processing
A generative speech model for daily dialogue.
[CVPR 2023] Official Implementation of X-Decoder for generalized decoding for pixel, image and language
Official release of InternLM series (InternLM, InternLM2, InternLM2.5, InternLM3).
An open source implementation of CLIP.