End-to-End Automatic Speech Recognition(ASR) CTC

This repository contains baseline models(3-5 layers Bi-LSTM) for ASR tasks on standard speech datasets(TIMIT, WSJ, Switchboard).

Model

3 layers or 5 layers BiLSTM + Softmax Layer + CTC Loss

3 dataloaders for 3 different datasets

	Switchboard	WSJ	TIMIT
Dev	11.86(CER)	6.1(CER)	13.429(PER)
Test		4.6(CER)	15.967(PER)

Visualization of LSTM hidden units before pretraining and after pretraining.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
ctcdecode-master		ctcdecode-master
data		data
img		img
speech		speech
.DS_Store		.DS_Store
README.md		README.md
ctc_decoder.py		ctc_decoder.py
dataloader_qm.py		dataloader_qm.py
dataloader_swbd.py		dataloader_swbd.py
dataloader_timit.py		dataloader_timit.py
main_swbd.py		main_swbd.py
main_timit.py		main_timit.py
main_wsj.py		main_wsj.py
model_swbd_3layers.py		model_swbd_3layers.py
model_swbd_5layers.py		model_swbd_5layers.py
model_wsj_3layers.py		model_wsj_3layers.py
model_wsj_5layers.py		model_wsj_5layers.py