NVIDIA NeMo is a conversational AI toolkit built for researchers working on automatic speech recognition (ASR), natural language processing (NLP), and text-to-speech synthesis (TTS). NVIDIA Riva is a GPU-accelerated SDK for building Speech AI applications that are customized for your use case and deliver real-time performance. While TAO Toolkit is the recommended path for typical users of Riva, some developers may prefer to use NeMo because it exposes more of the model and PyTorch internals. Riva supports the ability to import models trained in NeMo.
This repo provides Nemo2Riva: command-line tool to export trained NeMo models saved in .nemo format, to Riva input format (.riva) to be deployed in Riva.
Models trained in NVIDIA NeMo have the format .nemo
. To use these models in Riva, convert the model checkpoints to .riva
format for building and deploying with Riva ServiceMaker using the nemo2riva
tool. The nemo2riva
tool is currently packaged and available via the Riva Quick Start scripts.
-
Follow the NeMo installation instructions to set up a NeMo environment; version 1.10.0 or greater. From within your NeMo environment, run:
pip3 install nvidia-pyindex pip3 install nemo2riva nemo2riva --out /NeMo/<MODEL_NAME>.riva /NeMo/<MODEL_NAME>.nemo
For additional information and usage, run:
nemo2riva --help
Usage:
```
nemo2riva [-h] [--out OUT] [--validate] [--schema SCHEMA] [--format FORMAT] [--verbose VERBOSE] [--key KEY] source
When converting NeMo models to Riva `.eff` input format, passing the input `.nemo` as a parameter creates `.riva`.
If no `--format` is passed, the Riva-preferred format for the supplied model architecture is selected automatically.
The format is also derived from schema if the `--schema` argument is supplied, or if `nemo2riva` is able to find the schema for this NeMo model
among known models - there is a set of YAML files in the `nemo2riva/validation_schemas` directory, or you can add your own.
If the `--key` argument is passed, the model graph in the output EFF file is encrypted with that key.
positional arguments:
: source Source .nemo file
optional arguments:
: -h
--help
Show this help message and exit
--out
OUT
Location to write resulting Riva EFF input to (default: `None`)
--validate
Validate using schemas (default: `False`)
--schema
SCHEMA
Schema file to use for validation (default: `None`)
--format
FORMAT
Force specific export format: `ONNX|TS|CKPT|NEMO` (default: `None`)
--verbose
VERBOSE
Verbose level for logging, numeric (default: `None`)
--key
KEY
Encryption key or file (default: `None`)
```
Modality | Architecture | nemo2riva command |
Automatic Speech Recognition | Parakeet-CTC | nemo2riva --key tlt_encode --out <path to save .riva model> --onnx-opset 18 <path to .nemo model> |
Conformer-CTC | ||
Parakeet-TDT | nemo2riva --key tlt_encode --out <path to save .riva model> --format nemo <path to .nemo model> | |
Parakeet-RNNT | ||
Conformer-RNNT | ||
Canary | ||
Note: Extract CTC/RNNT head from Hybrid ASR using this script before running nemo2riva | ||
Text to Speech | HiFiGAN | nemo2riva --key tlt_encode --out <path to save .riva model> <path to .nemo model> |
RadTTS | ||
Fastpitch | nemo2riva --key tlt_encode --out <path to save .riva model> --onnx-opset 14 <path to .nemo model> | |
Voice Activity Detection | Segment VAD | nemo2riva --key tlt_encode --out <path to save .riva model> --onnx-opset 18 <path to .nemo model> |
FrameVAD | ||
Punctuation and Capitalization | Bert-Base | nemo2riva --key tlt_encode --out <path to save .riva model> --onnx-opset 18 <path to .nemo model> |
Bert-Large | ||
Neural Machine Translation | Megatron | nemo2riva --key tlt_encode --max-dim 1024 --out <path to save .riva model> <path to .nemo model> |