End-to-End Korean Speech Recognition

Character-unit based End-to-End Korean Speech Recognition

Documentation

Intro

This is project for Korean Speech Recognition using LAS (Listen, Attend and Spell) models
implemented in PyTorch.
We appreciate any kind of feedback or contribution.

Roadmap

Speech recognition is an interdisciplinary subfield of computational linguistics that develops methodologies and technologies that enables the recognition and translation of spoken language into text by computers.

We mainly referred to following papers.

「Listen, Attend and Spell」

「State-of-the-art Speech Recognition with Sequence-to-Sequence Models」

「SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition」.

if you want to study the feature of audio, we recommend this papers.

「Voice Recognition Using MFCC Algirithm」.

Our project based on Seq2seq with Attention Architecture.
Seq2seq is a fast evolving field with new techniques and architectures being published frequently.

We use AI Hub 1000h dataset which contains 1,000 hours korean voice data. and, our project is currently in progress.
At present our top model has recorded an 80% CRR, and we are working for a higher recognition rate.

Also our model has recorded 91% CRR in Kadi-zeroth dataset.

( CRR : Character Recognition Rate )

We are constantly updating the progress of the project on the Wiki page. Please check this page.

Installation

This project recommends Python 3.7 or higher.
We recommend creating a new virtual environment for this project (using virtual env or conda).

Prerequisites

Numpy: pip install numpy (Refer here for problem installing Numpy).
Pytorch: Refer to PyTorch website to install the version w.r.t. your environment.
Pandas: pip install pandas (Refer here for problem installing Pandas)
librosa: pip install librosa (Refer here for problem installing librosa)
tqdm: pip install tqdm (Refer here for problem installing tqdm)

Install from source

Currently we only support installation from source code using setuptools. Checkout the source code and run the
following commands:

pip install -r requirements.txt

python setup.py build
python setup.py install

Get Started

Preparation before Training

Refer here before Training.
The above document is written in Korean.
We will also write a document in English as soon as possible, so please wait a little bit.

If you already have another dataset, please modify the data set path to definition.py as appropriate.

Train and Test

if you want to start training, you should run train.py.
or after training, you want to start testing, you should run test.py.

you can set up a configuration config.py.
An explanation of configuration is here.

Incorporating External Language Model in Performance Test

We introduce incorporating external language model in performance test.
if you are interested in this content, please check here.

Troubleshoots and Contributing

If you have any questions, bug reports, and feature requests, please open an issue on Github.
For live discussions, please go to our gitter or Contacts [email protected] please.

We appreciate any kind of feedback or contribution. Feel free to proceed with small issues like bug fixes, documentation improvement. For major contributions and new features, please discuss with the collaborators in corresponding issues.

Code Style

We follow PEP-8 for code style. Especially the style of docstrings is important to generate documentation.

Reference

[1] 「Listen, Attend and Spell」 Paper
[2] 「State-of-the-art Speech Recognition with Sequence-to-Sequence Models」 Paper
[3] 「A Simple Data Augmentation Method for Automatic Speech Recognition」 Paper
[4] 「An analysis of incorporating an external language model into a sequence-to-sequence model」 Paper
[5] 「Voice Recognition Using MFCC Algorithm」 Paper
[6] IBM pytorch-seq2seq
[7] Character RNN Language Model
[8] A.I Hub Korean Voice Dataset
[9] Documentation

License

Copyright (c) 2020 Kai.Lib

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

End-to-End Korean Speech Recognition

Character-unit based End-to-End Korean Speech Recognition

Documentation

Intro

Roadmap

( CRR : Character Recognition Rate )

Installation

Prerequisites

Install from source

Get Started

Preparation before Training

Train and Test

Incorporating External Language Model in Performance Test

Troubleshoots and Contributing

Code Style

Reference

License

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
docs		docs
models		models
package		package
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py
test.py		test.py
train.py		train.py

License

wch18735/Korean-Speech-Recognition

Folders and files

Latest commit

History

Repository files navigation

End-to-End Korean Speech Recognition

Character-unit based End-to-End Korean Speech Recognition

Documentation

Intro

Roadmap

( CRR : Character Recognition Rate )

Installation

Prerequisites

Install from source

Get Started

Preparation before Training

Train and Test

Incorporating External Language Model in Performance Test

Troubleshoots and Contributing

Code Style

Reference

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages