Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
README.md		README.md
char_map.py		char_map.py
data_generator.py		data_generator.py
download.sh		download.sh
flac_to_wav.sh		flac_to_wav.sh
model.py		model.py
plot.py		plot.py
train.py		train.py
utils.py		utils.py

Repository files navigation

ba-dls-deepspeech

Train your own CTC model!

theano
The underlying deep learning Python library. We suggest using the bleeding edge version through
git clone https://github.com/Theano/Theano
Follow the instructions on: http://deeplearning.net/software/theano/install.html#bleeding-edge-install-instructions
Or, simply: cd Theano; python setup.py install
keras
This is a wrapper over Theano that provides nice functions for building networks. Once again, we suggest using the bleeding edge version. Make sure you install it with support for hdf5 - we make use of that to save models.
git clone https://github.com/fchollet/keras
Follow the installation instructions on https://github.com/fchollet/keras
Or, simply: cd keras; python setup.py install
warp-ctc
This contains the main implementation of the CTC cost function.
git clone https://github.com/baidu-research/warp-ctc
To install it, follow the instructions on https://github.com/baidu-research/warp-ctc
theano-warp-ctc
This is a theano wrapper over warp-ctc.
git clone https://github.com/sherjilozair/ctc
Follow the instructions on https://github.com/sherjilozair/ctc for installation.
Others
You may require some additional packages. Install Python requirements through pip as:
pip install soundfile
On Ubuntu, avconv (used here for audio format conversions) requires libav-tools.
sudo apt-get install libav-tools

Data

We will make use of the LibriSpeech ASR corpus to train our models. Use the download.sh script to download this corpus (~65GB). Use flac_to_wav.sh to convert any flac files to wav.
We make use of a JSON file that aggregates all data for training, validation and testing. Once you have a corpus, create a description file that is a json-line file in the following format:

{"duration": 15.685, "text": "spoken text label", "key": "/home/username/LibriSpeech/train-clean-360/5672/88367/5672-88367-0031.wav"}
{"duration": 14.32, "text": "ground truth text", "key": "/home/username/LibriSpeech/train-other-500/8678/280914/8678-280914-0009.wav"}

Each line is a JSON. We will make use of the durations to construct a curriculum in the first epoch (shorter utterances are easier).
You can query the duration of a file using: soxi -D filename. By default, we split this data as 80%: training, 10%: validation and 10%: testing. You can play around with these in data_generator.py

Running an example

Finally, let's train a model!
python train.py corpus.json ./save_my_model_here
This will checkpoint a model every few iterations into the directory you specify.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ba-dls-deepspeech

Table of Contents

Dependencies

Data

Running an example

About

Releases

Packages

Languages

License

stevexiaofei/ba-dls-deepspeech

Folders and files

Latest commit

History

Repository files navigation

ba-dls-deepspeech

Table of Contents

Dependencies

Data

Running an example

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages