This directory contains example scripts to train ASR models using various methods such as Connectionist Temporal Classification loss, RNN Transducer Loss.
Speech pre-training via self supervised learning, voice activity detection and other sub-domains are also included as part of this domain's examples.
The inference scripts in this directory execute in the following order. When preparing your own inference scripts, please follow this order for correct inference.
graph TD
A[Hydra Overrides + Config Dataclass] --> B{Config}
B --> |Init| C[Model]
B --> |Init| D[Trainer]
C & D --> E[Set trainer]
E --> |Optional| F[Change Transducer Decoding Strategy]
F --> H[Load Manifest]
E --> |Skip| H
H --> I["model.transcribe(...)"]
I --> J[Write output manifest]
K[Ground Truth Manifest]
J & K --> |Optional| L[Evaluate CER/WER]
During restoration of the model, you may pass the Trainer to the restore_from / from_pretrained call, or set it after the model has been initialized by using model.set_trainer(Trainer)
.