Lists (1)
Sort Name ascending (A-Z)
Stars
SOTA discrete acoustic codec models with 40 tokens per second for audio language modeling
Repository for Accent Recognition (Hackathon @SLT2022)
Self-Supervised Speech Pre-training and Representation Learning Toolkit
VoiceBench: Benchmarking LLM-Based Voice Assistants
This toolbox aims to unify audio generation model evaluation for easier comparison.
Algorithms for Intelligent Assessment of Human Personality Traits based on His Multimodal Data for ranking potential candidates to perform professional responsibilities
Learn how to use the Cognitive Services Python SDK with these samples
A collection of datasets for the purpose of emotion recognition/detection in speech.
A Compact and Effective Pretrained Model for Speech Emotion Recognition
[INTERSPEECH 2024] EmoBox: Multilingual Multi-corpus Speech Emotion Recognition Toolkit and Benchmark
Baichuan-Omni: Towards Capable Open-source Omni-modal LLM 🌊
A collection of dataset consists of a total of 8 English speech datasets for SER
SCOREQ: Speech COntrastive REgression for Quality Assessment (NeurIPS 2024)
LibriTTS-P: A Corpus with Speaking Style and Speaker Identity Prompts for Text-to-Speech and Style Captioning
The official repository of SpeechCraft dataset, a large-scale expressive bilingual speech dataset with natural language descriptions.
Robust Speech Recognition via Large-Scale Weak Supervision
한국어 음성인식 STT API 리스트. 각 성능 벤치마크.
The TTSDS benchmark evaluates synthetic speech quality by considering prosody, speaker identity, and intelligibility, comparing these factors with real speech and noise datasets.
Banchmark for personality traits prediction with neural networks