Skip to content
View roger-tseng's full-sized avatar

Block or report roger-tseng

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

Official implementation of the Interspeech 2024 paper "Lightweight Transducer Based on Frame Level Criterion".

Python 7 2 Updated Dec 11, 2024

[Interspeech 2024] Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation

Jupyter Notebook 96 6 Updated Nov 19, 2024

Official release of StyleTalk dataset.

60 2 Updated Jul 1, 2024

Interspeech2024 | Understanding Sounds, Missing the Questions: The Challenge of Object Hallucination in Large Audio-Language Models

Python 14 Updated Jul 28, 2024

Inference and training library for high-quality TTS models.

Python 4,807 493 Updated Dec 10, 2024

The Art of Debugging

C 827 36 Updated Aug 3, 2024

Read e-books in style

JavaScript 6,510 297 Updated Dec 20, 2024

Multi-Scale Neural Audio Codec (SNAC) compresses audio into discrete codes at a low bitrate

Python 457 26 Updated Nov 19, 2024

REBORN: Reinforcement-Learned Boundary Segmentation with Iterative Training for Unsupervised ASR

Python 7 1 Updated Dec 11, 2024
Python 242 14 Updated Oct 3, 2024

Taiwanese Speech Synthesis with Tacotron2

Python 19 5 Updated Oct 2, 2022

**Official** 李宏毅 (Hung-yi Lee) 機器學習 Machine Learning 2021 Spring

Jupyter Notebook 850 322 Updated Nov 9, 2023

Multi-Speaker Pytorch FastSpeech2: Fast and High-Quality End-to-End Text to Speech ✊

Python 94 16 Updated Oct 14, 2022

《SpeechPrompt v2: Prompt Tuning for Speech Classification Tasks》Speech processing with prompting paradigm

Python 82 6 Updated Oct 19, 2023

**Interspeech 2022** 《SpeechPrompt: An Exploration of Prompt Tuning on Generative Spoken Language Model for Speech Processing Tasks》Speech processing with prompting paradigm

Python 97 8 Updated Aug 25, 2023

《SpeechGen: Unlocking the Generative Power of Speech Language Models with Prompts》

74 5 Updated Jun 9, 2023

🧠 A study guide to learn about Transformers

1,545 146 Updated Jun 3, 2023

An open source implementation of CLIP.

Python 10,631 1,003 Updated Dec 4, 2024

Meaningful titles for tabs and PDF downloads! Also supports tab search.

JavaScript 293 19 Updated Oct 20, 2024

Drive a browser with GPT-3

Python 1,914 279 Updated Jun 9, 2024

Port of OpenAI's Whisper model in C/C++

C++ 36,476 3,735 Updated Dec 22, 2024

Implementation of multi-level Contrastive Predictive Coding (CPC) methods

Python 19 3 Updated Jan 12, 2023

Zero-Resource Speech Discovery, Search, and Evaluation Tools

C 29 17 Updated Aug 6, 2015

Book in preparation: introduction to theoretical computer science

TeX 919 185 Updated Mar 18, 2024

Segment an audio file and obtain utterance alignments. (Python package)

Python 324 29 Updated May 15, 2024

X (weighted / probabilistic) Context-Free Grammars

Python 25 2 Updated Jan 30, 2024

Large, modern dataset for speech recognition

Shell 653 62 Updated Feb 26, 2024

MinT: Minimal Transformer Library and Tutorials

Python 251 14 Updated Jul 26, 2022

Official codebase for ICLR oral paper Unsupervised Vision-Language Grammar Induction with Shared Structure Modeling

Python 35 3 Updated Apr 14, 2022
Next