Starred repositories
[arXiv 2024] Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis
PixArt-ÎŁ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation
Generative AI extensions for onnxruntime
Code for the paper Hybrid Spectrogram and Waveform Source Separation
🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch and FLAX.
OCR software for recognition of handwritten text
OCR, layout analysis, reading order, table recognition in 90+ languages
DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding
Reasoning in Large Language Models: Papers and Resources, including Chain-of-Thought and OpenAI o1 🍓
A library for advanced large language model reasoning
Ocular is a state-of-the-art historical OCR system.
Python-based tools for document analysis and OCR
text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)
SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer
⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡
An innovative library for efficient LLM inference via low-bit quantization
Fast inference engine for Transformer models
Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable…
adefossez / demucs
Forked from facebookresearch/demucsCode for the paper Hybrid Spectrogram and Waveform Source Separation
Efficient vision foundation models for high-resolution generation and perception.
Port of OpenAI's Whisper model in C/C++
A throughput-oriented high-performance serving framework for LLMs
Manipulate audio with a simple and easy high level interface
extract text from any document. no muss. no fuss.
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"