Skip to content
View xoicy's full-sized avatar

Block or report xoicy

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

SOTA Open Source TTS

Python 18,275 1,367 Updated Jan 12, 2025

[arXiv 2024] Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis

Python 927 100 Updated Jan 9, 2025

PixArt-ÎŁ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation

Python 1,731 85 Updated Oct 31, 2024

Generative AI extensions for onnxruntime

C++ 574 145 Updated Jan 11, 2025

Code for the paper Hybrid Spectrogram and Waveform Source Separation

Python 8,553 1,100 Updated Apr 24, 2024

🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch and FLAX.

Python 27,068 5,552 Updated Jan 12, 2025

Official repository for LTX-Video

Python 2,494 204 Updated Jan 3, 2025

OCR software for recognition of handwritten text

Jupyter Notebook 779 241 Updated Dec 23, 2022

OCR, layout analysis, reading order, table recognition in 90+ languages

Python 15,390 994 Updated Jan 11, 2025

DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding

Python 766 69 Updated Dec 30, 2024

Reasoning in Large Language Models: Papers and Resources, including Chain-of-Thought and OpenAI o1 🍓

2,243 127 Updated Dec 17, 2024

A library for advanced large language model reasoning

Python 1,651 143 Updated Jan 10, 2025

Ocular is a state-of-the-art historical OCR system.

Java 257 48 Updated Jun 7, 2024

Python-based tools for document analysis and OCR

Jupyter Notebook 3,429 592 Updated May 22, 2021

text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)

Python 10,242 956 Updated Jan 12, 2025

SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer

Python 1,937 100 Updated Jan 12, 2025

⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡

Python 2,150 210 Updated Oct 8, 2024

An innovative library for efficient LLM inference via low-bit quantization

C++ 352 38 Updated Aug 30, 2024

Official inference framework for 1-bit LLMs

C++ 12,584 881 Updated Dec 20, 2024

Low-bit LLM inference on CPU with lookup table

C++ 641 48 Updated Jan 9, 2025

Fast inference engine for Transformer models

C++ 3,517 310 Updated Dec 18, 2024

Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable…

Python 21,288 2,205 Updated Nov 11, 2024

Code for the paper Hybrid Spectrogram and Waveform Source Separation

Python 1,134 109 Updated Jul 15, 2024

Efficient vision foundation models for high-resolution generation and perception.

Python 2,540 204 Updated Dec 24, 2024

Port of OpenAI's Whisper model in C/C++

C++ 36,841 3,786 Updated Jan 9, 2025

A throughput-oriented high-performance serving framework for LLMs

Cuda 692 29 Updated Sep 21, 2024

Manipulate audio with a simple and easy high level interface

Python 9,093 1,060 Updated Jul 25, 2024

extract text from any document. no muss. no fuss.

HTML 3,954 612 Updated Dec 2, 2024

PDF to Markdown with vision models

Python 7,891 475 Updated Dec 18, 2024

Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"

Python 8,854 1,164 Updated Jan 9, 2025
Next