Lists (7)
Sort Name ascending (A-Z)
Starred repositories
Real-Time SLAM for Monocular, Stereo and RGB-D Cameras, with Loop Detection and Relocalization Capabilities
HunyuanVideo: A Systematic Framework For Large Video Generation Model
We present StableAnimator, the first end-to-end ID-preserving video diffusion framework, which synthesizes high-quality videos without any post-processing, conditioned on a reference image and a se…
OpenOCR: A general OCR system with accuracy and efficiency. Supporting 24 Scene Text Recognition methods trained from scratch on large-scale real datasets, and will continue to add the latest methods.
EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation
The Open-Source Data Annotation Platform
Data annotation toolbox supports image, audio and video data.
A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。
Qwen2.5-Coder is the code version of Qwen2.5, the large language model series developed by Qwen team, Alibaba Cloud.
A modular graph-based Retrieval-Augmented Generation (RAG) system
High-performance C++ library for multiphysics and multibody dynamics simulations
Depth Pro: Sharp Monocular Metric Depth in Less Than a Second.
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
✨✨VITA: Towards Open-Source Interactive Omni Multimodal LLM
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
Convert PDF to markdown + JSON quickly with high accuracy
The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
[CVPR 2024] Official RT-DETR (RTDETR paddle pytorch), Real-Time DEtection TRansformer, DETRs Beat YOLOs on Real-time Object Detection. 🔥 🔥 🔥
The Triton Inference Server provides an optimized cloud and edge inferencing solution.
Strong and Open Vision Language Assistant for Mobile Devices
A minimalist environment for decision-making in autonomous driving
Development repository for the Triton language and compiler
Fast and memory-efficient exact attention
text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)