Starred repositories
利用AI大模型,一键生成高清短视频 Generate short videos with one click using AI LLM.
sjh724 / VideoPlus
Forked from ddean2009/MoneyPrinterPlusAI一键批量生成各类短视频,自动批量混剪短视频,自动把视频发布到抖音,快手,小红书,视频号上,赚钱从来没有这么容易过! 支持本地语音模型chatTTS,fasterwhisper,GPTSoVITS,支持云语音:Azure,阿里云,腾讯云。支持Stable diffusion,comfyUI直接AI生图。Generate short videos with one click using A…
SkyReels V1: The first and most advanced open-source human-centric video foundation model
MusePose: a Pose-Driven Image-to-Video Framework for Virtual Human Generation
A novel approach to hunyuan image-to-video sampling
推荐使用DeepSeekV3。可以接入微信、QQBot。基于LLM的更逼真的情感陪伴程序。Support WeChat& QQBot. More realistic emotional companionship program based LLM, meet the characters in your dream.
😎丰富生态、🧩支持扩展、🦄多模态 - 大模型原生即时通信机器人平台 | 适配 QQ / 微信(企业微信、个人微信)/ 飞书 / 钉钉 / Discord / Telegram 等消息平台 | 支持 ChatGPT、DeepSeek、Dify、Claude、Gemini、xAI Grok、Ollama、LM Studio、阿里云百炼、SiliconFlow、Qwen、Moonshot、Chat…
YuE: Open Full-song Music Generation Foundation Model, something similar to Suno.ai but open
🎵根据歌词生成音乐的项目,零成本将机器学习模型部署上线(前端(Vue3.js+Vite)+后端(github actions+python+paddlepaddle))
Implementation of Prompt-Singer: Controllable Singing-Voice-Synthesis with Natural Language Prompt (NAACL'24).
基于 suno.ai 实现的文字快速创作音乐网站 (A text-based rapid music creation website based on suno.ai )
InspireMusic: A Unified Framework for Music, Song, Audio Generation.
Android Voice Activity Detection (VAD) library. Supports WebRTC VAD GMM, Silero VAD DNN, Yamnet VAD DNN models.
The fastest digital human algorithm, now on your desktop.
👩🏿💻👨🏾💻👩🏼💻👨🏽💻👩🏻💻中国独立开发者项目列表 -- 分享大家都在做什么
MultiBot Chat 是一个基于 Streamlit 的多机器人聊天应用,支持多种大语言模型(LLM)API,包括 OpenAI、AzureOpenAI、ChatGLM、CoZe、Qwen、Ollama、XingHuo、DeepSeek、Moonshot、Yi 和 Groq。这个应用允许用户同时与多个 AI 聊天机器人进行对话,比较不同模型的回答,并进行群聊式的讨论。
字节青训营 X MarsCode 前端LLM对话框 对接coze api
集成了openai-api、coze、deepseek、cursor、windsurf、blackbox、you、grok、bing 绘画 多款AI的聊天逆向接口适配到 OpenAI API 标准接口服务端。
S2ORC: The Semantic Scholar Open Research Corpus: https://www.aclweb.org/anthology/2020.acl-main.447/
TEN Agent is a conversational AI powered by the TEN, integrating Gemini 2.0 Live, OpenAI Realtime, RTC, and more. It delivers real-time capabilities to see, hear, and speak, while being fully compa…
OCR, layout analysis, reading order, table recognition in 90+ languages
Towards Open-source GPT-4o with Vision, Speech and Duplex Capabilities。