-
National Taiwan University
- Hsinchu, Taiwan
- roger-tseng.github.io
Lists (1)
Sort Name ascending (A-Z)
Starred repositories
Official implementation of the Interspeech 2024 paper "Lightweight Transducer Based on Frame Level Criterion".
[Interspeech 2024] Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation
Interspeech2024 | Understanding Sounds, Missing the Questions: The Challenge of Object Hallucination in Large Audio-Language Models
Inference and training library for high-quality TTS models.
Multi-Scale Neural Audio Codec (SNAC) compresses audio into discrete codes at a low bitrate
REBORN: Reinforcement-Learned Boundary Segmentation with Iterative Training for Unsupervised ASR
Taiwanese Speech Synthesis with Tacotron2
**Official** 李宏毅 (Hung-yi Lee) 機器學習 Machine Learning 2021 Spring
Multi-Speaker Pytorch FastSpeech2: Fast and High-Quality End-to-End Text to Speech ✊
《SpeechPrompt v2: Prompt Tuning for Speech Classification Tasks》Speech processing with prompting paradigm
**Interspeech 2022** 《SpeechPrompt: An Exploration of Prompt Tuning on Generative Spoken Language Model for Speech Processing Tasks》Speech processing with prompting paradigm
《SpeechGen: Unlocking the Generative Power of Speech Language Models with Prompts》
🧠 A study guide to learn about Transformers
An open source implementation of CLIP.
Meaningful titles for tabs and PDF downloads! Also supports tab search.
Port of OpenAI's Whisper model in C/C++
Implementation of multi-level Contrastive Predictive Coding (CPC) methods
Zero-Resource Speech Discovery, Search, and Evaluation Tools
Book in preparation: introduction to theoretical computer science
Segment an audio file and obtain utterance alignments. (Python package)
X (weighted / probabilistic) Context-Free Grammars
Large, modern dataset for speech recognition
Official codebase for ICLR oral paper Unsupervised Vision-Language Grammar Induction with Shared Structure Modeling