Skip to content
View fatchord's full-sized avatar

Block or report fatchord

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

PyTorch implementation of FractalGen https://arxiv.org/abs/2502.17437

Python 689 30 Updated Feb 25, 2025

Wan: Open and Advanced Large-Scale Video Generative Models

Python 5,332 447 Updated Feb 28, 2025

This repository contains the code for the paper "voc2vec: A Foundation Model for Non-Verbal Vocalization", accepted at ICASSP 2025.

Python 11 Updated Feb 24, 2025

Genome modeling and design across all domains of life

Jupyter Notebook 2,284 204 Updated Feb 24, 2025
Python 169 15 Updated Feb 23, 2025

The official implementation of TokenSynth (ICASSP 2025)

Python 44 1 Updated Feb 19, 2025

Unified automatic quality assessment for speech, music, and sound.

Python 381 22 Updated Feb 13, 2025

s1: Simple test-time scaling

Python 5,767 656 Updated Feb 23, 2025

Fully open reproduction of DeepSeek-R1

Python 21,802 1,931 Updated Mar 1, 2025

Reverse Engineering of Supervised Semantic Speech Tokenizer (S3Tokenizer) proposed in CosyVoice

Python 257 33 Updated Jan 15, 2025

YuE: Open Full-song Music Generation Foundation Model, something similar to Suno.ai but open

Python 4,138 440 Updated Mar 1, 2025

Clean, minimal, accessible reproduction of DeepSeek R1-Zero

Python 10,807 1,386 Updated Feb 1, 2025
Python 69 4 Updated Jan 22, 2025

Codec for paper: LLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesis

Python 187 20 Updated Feb 24, 2025

LLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesis

Python 373 27 Updated Feb 14, 2025

Training Large Language Model to Reason in a Continuous Latent Space

Python 922 78 Updated Jan 24, 2025

Code for NeurIPS 2024 paper - The GAN is dead; long live the GAN! A Modern Baseline GAN - by Huang et al.

Python 677 22 Updated Jan 23, 2025

Code for the paper "FLowHigh: Towards efficient and high-quality audio super-resolution with single-step flow matching"

Python 54 5 Updated Jan 17, 2025

[INTERSPEECH 2024] EmoBox: Multilingual Multi-corpus Speech Emotion Recognition Toolkit and Benchmark

Python 205 9 Updated Jun 17, 2024

A parallel ODE solver for PyTorch

Python 248 18 Updated Oct 3, 2024

The best OSS video generation models

Python 2,950 310 Updated Jan 8, 2025

Fourier Dual Diffusion

Python 46 1 Updated Feb 28, 2025

SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer

Python 3,470 209 Updated Feb 12, 2025

Out-of-the-box (OOTB) GUI Agent for Windows and macOS

Python 1,332 126 Updated Feb 26, 2025

Awesome speech/audio LLMs, representation learning, and codec models

907 58 Updated Feb 28, 2025

Awesome speech/audio LLMs, representation learning, and codec models

3 Updated Oct 12, 2024

Speaker change detection using SincNet and an LSTM/Transformer

Jupyter Notebook 47 6 Updated Jun 30, 2024

SONAR, a new multilingual and multimodal fixed-size sentence embedding space, with a full suite of speech and text encoders and decoders.

Python 692 71 Updated Dec 11, 2024
TypeScript 10,542 609 Updated Mar 1, 2025
Next