Stars
Streamlit — A faster way to build and share data apps.
Official Pytorch implementation of the ACM MM'21 paper: Keyframe Extraction from Motion Capture Sequences with Graph based Deep Reinforcement Learning
Invoke is a leading creative engine for Stable Diffusion models, empowering professionals, artists, and enthusiasts to generate and create visual media using the latest AI-driven technologies. The …
GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use.
Amphion-MaskGCT:0-sample voice synthesis and OpenAI-whisper-large-v3:Speech-to-text ComfyUI node packaging
MVPaint: Synchronized Multi-View Diffusion for Painting Anything 3D
Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with …
A simple screen parsing tool towards pure vision based GUI agent
RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.
TEN Agent is a conversational AI powered by TEN, integrating Gemini 2.0 Multimodal Live API, OpenAI Realtime API, RTC, and more. It offers real-time capabilities to see, hear, and speak, along with…
LLM Agent Framework in ComfyUI includes Omost,GPT-sovits, ChatTTS,GOT-OCR2.0, and FLUX prompt nodes,access to Feishu,discord,and adapts to all llms with similar openai / aisuite interfaces, such as…
GLSL node for ComfyUI
open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.
Mitsuba 3: A Retargetable Forward and Inverse Renderer
User-friendly AI Interface (Supports Ollama, OpenAI API, ...)
AirLLM 70B inference with single 4GB GPU
Wiseflow is an agile information mining tool that extracts concise messages from various sources such as websites, WeChat official accounts, social platforms, etc. It automatically categorizes and …
Unified Efficient Fine-Tuning of 100+ LLMs (ACL 2024)
MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone
This is the official reproduction of FancyVideo.
Recommended based on comfyui node pictures:Joy_caption + MiniCPMv2_6-prompt-generator + florence2
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…
🔥 2D and 3D Face alignment library build using pytorch
Demo Programs for the "Talking Head(?) Anime from a Single Image 4: Improved Models and Its Distillation" Project