Lists (1)
Sort Name ascending (A-Z)
Stars
Autonomous coding agent right in your IDE, capable of creating/editing files, executing commands, using the browser, and more with your permission every step of the way.
Python tool for converting files and office documents to Markdown.
A feature-rich command-line audio/video downloader
👾 Fast and simple video download library and CLI tool written in Go
OpenBMB / mlc-MiniCPM
Forked from mlc-ai/mlc-llmMiniCPM on Android platform.
🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch and FLAX.
The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.
[SIGGRAPH Asia 2022] VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
real time face swap and one-click video deepfake with only a single image
AI app store powered by 24/7 desktop history. open source | 100% local | dev friendly | 24/7 screen, mic recording
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
Empowering RAG with a memory-based data interface for all-purpose applications!
open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.
✨✨VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction
MiniCPM-o 2.6: A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone
Microsoft's GraphRAG + AutoGen + Ollama + Chainlit = Fully Local & Free Multi-Agent RAG Superbot
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…
EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditioning
This is the official implementation of "Flash-VStream: Memory-Based Real-Time Understanding for Long Video Streams"
跨平台视频提取工具:支持流媒体下载、视频下载、m3u8 下载及 B站视频下载,提供 Windows 和 Mac 桌面客户端。Cross-platform video extraction tool: Supports streaming download, video download, m3u8 download, and Bilibili video download, with des…
GraphRAG using Local LLMs - Features robust API and multiple apps for Indexing/Prompt Tuning/Query/Chat/Visualizing/Etc. This is meant to be the ultimate GraphRAG/KG local LLM app.
SEED-Story: Multimodal Long Story Generation with Large Language Model
A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。
2025年2月更新,五星体育直播源、咪咕、五大联赛直播源、F1直播源,IPTV电视直播源、APTV电视直播源、IPTV直播软件、中国、台港澳、海外IPTV直播源M3U、TV观看工具,iptv最新可用直播源iptv4/iptv6,TVBox接口,福利节目源,IPTV检查工具、电视家替代APP
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
MedicalGPT: Training Your Own Medical GPT Model with ChatGPT Training Pipeline. 训练医疗大模型,实现了包括增量预训练(PT)、有监督微调(SFT)、RLHF、DPO、ORPO、GRPO。