GitHub - shop668/rnnoise at f5cc425fd6fc20f1e610cc4a055b554e7b2913e3

shop668 / rnnoise Public

forked from xiph/rnnoise

Notifications You must be signed in to change notification settings
Fork 0
Star 1

Recurrent neural network for audio noise reduction

BSD-3-Clause license

1 star 924 forks Branches Tags Activity

Star

Notifications

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 115 Commits
doc		doc
examples		examples
include		include
m4		m4
scripts		scripts
src		src
torch		torch
training		training
AUTHORS		AUTHORS
COPYING		COPYING
Makefile.am		Makefile.am
README		README
TRAINING-README		TRAINING-README
autogen.sh		autogen.sh
configure.ac		configure.ac
download_model.sh		download_model.sh
model_version		model_version
rnnoise-uninstalled.pc.in		rnnoise-uninstalled.pc.in
rnnoise.pc.in		rnnoise.pc.in
update_version		update_version

Repository files navigation

RNNoise is a noise suppression library based on a recurrent neural network.
A description of the algorithm is provided in the following paper:

J.-M. Valin, A Hybrid DSP/Deep Learning Approach to Real-Time Full-Band Speech
Enhancement, Proceedings of IEEE Multimedia Signal Processing (MMSP) Workshop,
arXiv:1709.08243, 2018.
https://arxiv.org/pdf/1709.08243.pdf

An interactive demo is available at: https://jmvalin.ca/demo/rnnoise/

To compile, just type:
% ./autogen.sh
% ./configure
% make

Optionally:
% make install

Note that the autogen.sh script will automatically download the model files
from the Xiph.Org servers, since those are too large to put in Git.

While it is meant to be used as a library, a simple command-line tool is
provided as an example. It operates on RAW 16-bit (machine endian) mono
PCM files sampled at 48 kHz. It can be used as:

./examples/rnnoise_demo <noisy speech> <output denoised>

The output is also a 16-bit raw PCM file.

The latest version of the source is available from
https://gitlab.xiph.org/xiph/rnnoise . The github repository
is a convenience copy.

== TRAINING ==

To train an RNNoise model, you need both clean speech data, and noise data.
Both need to be sampled at 48 kHz, in 16-bit PCM format (machine endian).
Assuming the files are called speech.pcm and noise.pcm, start by generating
the training data with

% ./dump_features speech.pcm noise.pcm features.f32 <count>
where <count> is the number of sequences to process. The number of sequences
should be at least 10000, but the more the better.

To make the feature generation faster, you can use the script provided in
script/dump_features_parallel.sh

To use it:
% script/dump_features_parallel.sh ./dump_features speech.pcm noise.pcm features.f32 <count> <nb_processes>
which will run nb_processes processes, each for count sequences, and
concatenate the output to a single file.

Once the feature file is computed, you can start the training with:
% python3 train_rnnoise.py features.f32 output_directory

The training will produce .pth files, e.g. rnnoise_200.pth
The next step is to convert the model to C files using:

% python3 dump_rnnoise_weights.py --quantize rnnoise_200.pth rnnoise_c

which will produce the rnnoise_data.c and rnnoise_data.h files in the
rnnoise_c directory.

Copy these files to src/ and then build RNNoise using the instructions above.