Stars
A curated list of recent diffusion models for video generation, editing, and various other applications.
Qwen2.5-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
Mixture-of-Experts for Large Vision-Language Models
DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding
这是一个从头训练大语言模型的项目,包括预训练、微调和直接偏好优化,模型拥有1B参数,支持中英文。
Implement a ChatGPT-like LLM in PyTorch from scratch, step by step
ChenJian7578 / yolov8
Forked from ultralytics/ultralyticsNEW - YOLOv8 🚀 in PyTorch > ONNX > OpenVINO > CoreML > TFLite
The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (V…
功能: 使用阿里云智能语音服务中的录音文件识别 API,实现将视频、音频文件转写出 srt 字幕