Stars
Real-time Speech-Text Foundation Model Toolkit (wip)
Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.
open-source framework for creating and managing simulations populated with AI-powered agents. It provides an intuitive platform for designing complex, interactive environments where agents can act,…
A simple, easy-to-hack GraphRAG implementation
HumanLayer enables AI agents to communicate with humans in tool-based and async workflows. Guarantee human oversight of high-stakes function calls with approval workflows across slack, email and mo…
Repository for the Lux AI Challenge, season 3 @NeurIPS 24. Hosted on @kaggle
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.
Various AI scripts. Mostly Stable Diffusion stuff.
Train high-quality text-to-image diffusion models in a data & compute efficient manner
[ICCV 2023] DDColor: Towards Photo-Realistic Image Colorization via Dual Decoders
[ECCV 2024] OMG: Occlusion-friendly Personalized Multi-concept Generation In Diffusion Models
A general fine-tuning kit geared toward diffusion models.
InstantID: Zero-shot Identity-Preserving Generation in Seconds 🔥
LLaVA-JP is a Japanese VLM trained by LLaVA method
A repository of Japanese Phoneme-Level BERT
Research and Production Oriented Speaker Verification, Recognition and Diarization Toolkit
Official repository for the "Powerset multi-class cross entropy loss for neural speaker diarization" paper published in Interspeech 2023.