Skip to content
View ICHRick's full-sized avatar

Block or report ICHRick

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

無料で使える中品質なテキスト読み上げソフトウェア、VOICEVOXのコア

Rust 915 122 Updated Mar 4, 2025

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

Python 14,268 1,543 Updated Mar 4, 2025

The official PyTorch implementation of "FullSubNet+: Channel Attention FullSubNet with Complex Spectrograms for Speech Enhancement".

Python 253 57 Updated Apr 23, 2024

A transformer-based network model for pitch detection

Python 164 6 Updated Dec 19, 2023

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Python 21,704 2,384 Updated Aug 12, 2024

Mamba SSM architecture

Python 14,139 1,231 Updated Jan 18, 2025

The LLM Evaluation Framework

Python 5,344 449 Updated Mar 3, 2025

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

Jupyter Notebook 6,974 841 Updated Mar 1, 2025

Generative Models by Stability AI

Python 25,434 2,821 Updated Sep 4, 2024

The Open Source Code of UniAudio

Python 546 32 Updated Jul 22, 2024

PyTorch Implementation of GenerSpeech (NeurIPS'22): a text-to-speech model towards zero-shot style transfer of OOD custom voice.

Python 320 44 Updated Feb 9, 2024

Fast audio data augmentation in PyTorch. Inspired by audiomentations. Useful for deep learning.

Python 1,004 92 Updated Jan 15, 2025

CKIP Transformers

Python 716 75 Updated Apr 21, 2023

Implementation of MusicLM, Google's new SOTA model for music generation using attention networks, in Pytorch

Python 3,231 263 Updated Sep 6, 2023

eSpeak NG is an open source speech synthesizer that supports more than hundred languages and accents.

C 4,761 971 Updated Mar 3, 2025

A PyTorch-based Speech Toolkit

Python 9,452 1,445 Updated Feb 27, 2025

Implementation of AudioLM, a SOTA Language Modeling Approach to Audio Generation out of Google Research, in Pytorch

Python 2,507 275 Updated Jan 12, 2025

State-of-the-art deep learning based audio codec supporting both mono 24 kHz audio and stereo 48 kHz audio.

Python 3,614 319 Updated Jan 4, 2024

FreeVC: Towards High-Quality Text-Free One-Shot Voice Conversion

Python 654 114 Updated Jan 19, 2025

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production

Python 38,163 4,777 Updated Aug 16, 2024

🔊 Text-Prompted Generative Audio Model

Jupyter Notebook 37,125 4,378 Updated Aug 19, 2024