Skip to content

Commit

Permalink
add real-time gcc-nmf notebook
Browse files Browse the repository at this point in the history
  • Loading branch information
seanwood committed Aug 6, 2017
1 parent 6e3276b commit d6da3ac
Show file tree
Hide file tree
Showing 2 changed files with 374 additions and 0 deletions.
Binary file added notebooks/images/RealtimeGCCNMFScreenshot.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
374 changes: 374 additions & 0 deletions notebooks/realtimeSpeechEnhancement.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,374 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<!---\n",
"The MIT License (MIT)\n",
"\n",
"Copyright (c) 2017 Sean UN Wood\n",
"\n",
"Permission is hereby granted, free of charge, to any person obtaining a copy\n",
"of this software and associated documentation files (the \"Software\"), to deal\n",
"in the Software without restriction, including without limitation the rights\n",
"to use, copy, modify, merge, publish, distribute, sublicense, and/or sell\n",
"copies of the Software, and to permit persons to whom the Software is\n",
"furnished to do so, subject to the following conditions:\n",
"\n",
"The above copyright notice and this permission notice shall be included in all\n",
"copies or substantial portions of the Software.\n",
"\n",
"THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\n",
"IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\n",
"FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\n",
"AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\n",
"LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\n",
"OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\n",
"SOFTWARE.\n",
"--->\n",
"\n",
"# Real-time Speech Enhancement with GCC-NMF\n",
"\n",
"#### Sean UN Wood, August 2017\n",
"\n",
"In this iPython notebook, we present a real-time Python implementation of the GCC-NMF speech enhancement algorithm presented in:\n",
"\n",
"- Sean UN Wood and Jean Rouat, [*Real-time Speech Enhancement with GCC-NMF: Demonstration on the Raspberry Pi and NVIDIA Jetson*](https://www.researchgate.net/profile/Sean_Wood7/publication/318946628_Real-time_Speech_Enhancement_with_GCC-NMF_Demonstration_on_the_Raspberry_Pi_and_NVIDIA_Jetson/links/59872715aca27266ada22465/Real-time-Speech-Enhancement-with-GCC-NMF-Demonstration-on-the-Raspberry-Pi-and-NVIDIA-Jetson.pdf), **Interspeech 2017 Show and Tell Demonstrations**.\n",
"\n",
"that built on previous work:\n",
" - Sean UN Wood and Jean Rouat, [*Real-time Speech Enhancement with GCC-NMF*](https://www.researchgate.net/publication/318511757_Real-time_Speech_Enhancement_with_GCC-NMF), **Interspeech 2017**.\n",
"\n",
"This is essentially a real-time implementation of the [Online Speech Enhancement](https://nbviewer.jupyter.org/github/seanwood/gcc-nmf/blob/master/notebooks/onlineSpeechEnhancement.ipynb) notebook presented previously, with the addition of soft GCC-NMF mask generation.\n",
"\n",
"\n",
"## Notebook Overview\n",
"\n",
"1. Preliminary setup: Python dependencies\n",
"2. Real-time GCC-NMF: Graphical User Interface (GUI)\n",
"3. Real-time GCC-NMF: Command-line Interface (CLI)\n",
"4. Configuration parameters"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 1. Preliminary setup\n",
"### Dependencies\n",
"In addition to [numpy](http://www.numpy.org/) and [scipy](https://www.scipy.org/), a few additional dependencies are required:\n",
"\n",
"1. [Theano](http://deeplearning.net/software/theano/) for its GPU acceleration and optimizing compiler.\n",
"2. [PyAudio](https://people.csail.mit.edu/hubert/pyaudio/) for real-time audio playback.\n",
"\n",
"To run the graphical user interface, we also require:\n",
"1. [PyQt](https://riverbankcomputing.com/software/pyqt/intro) Python bindings for the [Qt](https://www.qt.io/) application framework.\n",
"2. [pyqtgraph](http://www.pyqtgraph.org/) scientific graphics and GUI library.\n",
"\n",
"### Installing dependencies\n",
"All dependencies can be installed with [pip](https://pip.pypa.io/en/stable/), either on the command line:\n",
"\n",
"`$ pip install numpy scipy theano pyaudio`\n",
"\n",
"`$ pip install pyqt5 pyqtgraph`\n",
"\n",
"or programatically from within this notebook or a Python interpreter:"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"import pip"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Requirement already satisfied: numpy in /usr/local/lib/python3.6/site-packages\n",
"Requirement already satisfied: scipy in /usr/local/lib/python3.6/site-packages\n",
"Requirement already satisfied: numpy>=1.8.2 in /usr/local/lib/python3.6/site-packages (from scipy)\n",
"Requirement already satisfied: theano in /usr/local/lib/python3.6/site-packages\n",
"Requirement already satisfied: numpy>=1.9.1 in /usr/local/lib/python3.6/site-packages (from theano)\n",
"Requirement already satisfied: scipy>=0.14 in /usr/local/lib/python3.6/site-packages (from theano)\n",
"Requirement already satisfied: six>=1.9.0 in /usr/local/lib/python3.6/site-packages (from theano)\n",
"Requirement already satisfied: pyaudio in /usr/local/lib/python3.6/site-packages\n",
"\n",
"Finished installing required dependencies\n"
]
}
],
"source": [
"# required dependencies\n",
"pip.main(['install', 'numpy'])\n",
"pip.main(['install', 'scipy'])\n",
"pip.main(['install', 'theano'])\n",
"pip.main(['install', 'pyaudio'])\n",
"print('\\nFinished installing required dependencies')"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Requirement already satisfied: pyqt5 in /usr/local/lib/python3.6/site-packages\n",
"Requirement already satisfied: sip<4.20,>=4.19.3 in /usr/local/lib/python3.6/site-packages (from pyqt5)\n",
"Requirement already satisfied: pyqtgraph in /usr/local/lib/python3.6/site-packages\n",
"Requirement already satisfied: numpy in /usr/local/lib/python3.6/site-packages (from pyqtgraph)\n",
"\n",
"Finished installing GUI dependencies\n"
]
}
],
"source": [
"# dependencies for GUI\n",
"pip.main(['install', 'pyqt5'])\n",
"pip.main(['install', 'pyqtgraph'])\n",
"print('\\nFinished installing GUI dependencies')"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"# 2. Real-time GCC-NMF: with GUI"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Preliminary imports for logging"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"import logging\n",
"logging.getLogger().setLevel(logging.INFO)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Running real-time GCC-NMF\n",
"\n",
"The real-time GCC-NMF executable can be found at `gccNMF/realtime/runRealtimeGCCNMF.py`. At the root directory of this repo, we can start the program as follows:\n",
"\n",
"`$ python gccNMF/realtime/runRealtimeGCCNMF.py`\n",
"\n",
"We can also start it programatically:"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"scrolled": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"GCCNMFConfig: loading configuration params...\n",
"TDOA\n",
" numTDOAs: 64\n",
" numTDOAHistory: 128\n",
" numSpectrogramHistory: 128\n",
" gccPHATNLAlpha: 2.0\n",
" gccPHATNLEnabled: False\n",
" microphoneSeparationInMetres: 0.1\n",
" targetTDOAEpsilon: 5.0\n",
" targetTDOABeta: 2.0\n",
" targetTDOANoiseFloor: 0.0\n",
"Audio\n",
" numChannels: 2\n",
" sampleRate: 16000\n",
" deviceIndex: None\n",
"STFT\n",
" windowSize: 1024\n",
" hopSize: 512\n",
" blockSize: 512\n",
"NMF\n",
" dictionarySize: 64\n",
" dictionarySizes: [64, 128, 256, 512, 1024]\n",
" dictionaryType: Pretrained\n",
" numHUpdates: 0\n",
"GCCNMFPretraining: Loading pretrained W (size 64): /gcc-nmf/data/pretrainedW/W_64.npy\n",
"GCCNMFPretraining: Pretrained W not found at /gcc-nmf/data/pretrainedW/W_64.npy, creating...\n",
"GCCNMFPretraining: Loading pretrained W (size 128): /gcc-nmf/data/pretrainedW/W_128.npy\n",
"GCCNMFPretraining: Pretrained W not found at /gcc-nmf/data/pretrainedW/W_128.npy, creating...\n",
"GCCNMFPretraining: Loading pretrained W (size 256): /gcc-nmf/data/pretrainedW/W_256.npy\n",
"GCCNMFPretraining: Pretrained W not found at /gcc-nmf/data/pretrainedW/W_256.npy, creating...\n",
"GCCNMFPretraining: Loading pretrained W (size 512): /gcc-nmf/data/pretrainedW/W_512.npy\n",
"GCCNMFPretraining: Pretrained W not found at /gcc-nmf/data/pretrainedW/W_512.npy, creating...\n",
"GCCNMFPretraining: Loading pretrained W (size 1024): /gcc-nmf/data/pretrainedW/W_1024.npy\n",
"GCCNMFPretraining: Pretrained W not found at /gcc-nmf/data/pretrainedW/W_1024.npy, creating...\n",
"RealtimeGCCNMF: Starting with audio path: /gcc-nmf/data/dev_Sq1_Co_A_mix.wav\n",
"Loading interface with audio path: /gcc-nmf/data/dev_Sq1_Co_A_mix.wav\n",
"GCCNMFInterface: setting dictionarySize: 64\n",
"RealtimeGCCNMFInterfaceWindow: closing...\n",
"Window closed\n",
"GCCNMFProcessor: received terminate\n",
"Audio process joined\n",
"GCCNMF process joined\n",
"Done!\n"
]
}
],
"source": [
"from gccNMF.realtime.runRealtimeGCCNMF import RealtimeGCCNMF\n",
"RealtimeGCCNMF()\n",
"print('Done!')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Startup may take a little while the first time it runs, as we pre-learn several NMF dictionaries of varying size. The dictionaries are saved to `data/realtimeGCCNMFPretrainedW`, so subsequent launches will be much quicker.\n",
"\n",
"Once the window opens, click the `Play` button (or hit space) and you should see something like this:\n",
"\n",
"![Real-time GCC-NMF Interface](images/RealtimeGCCNMFScreenshot.png \"Real-time GCC-NMF Interface\")\n",
"\n",
"The images roll with the input audio in waterfall-style. At the left, we see the input spectrogram on top, and the output spectrogram on the bottom. At the right, we have the GCC-PHAT angular spectrogram of the input on top (numTDOA x time), and the NMF mask on the bottom (numAtoms x time). In the center-bottom, we see the currently selected NMF dictionary (numAtom x frequency). Finally, at the center-top, we have the controls over the GCC-NMF masking function, dictionary size, number of coefficient inference updates per frame, as well as buttons to control playback, enable/disable enhancement, and toggle the info strings above the images."
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"# 3. Real-time GCC-NMF: no GUI"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Real-time GCC-NMF can also be run without the GUI, by passing the `--no-gui` as an argument on the command line:\n",
"\n",
"`$ python gccNMF/realtime/runRealtimeGCCNMF.py --no-gui`\n",
"\n",
"or programatically by instantiating the `RealtimeGCCNMFNoGUI` class instead of the `RealtimeGCCNMF` class we used above:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true,
"scrolled": false
},
"outputs": [],
"source": [
"from gccNMF.realtime.runRealtimeGCCNMF import RealtimeGCCNMFNoGUI\n",
"RealtimeGCCNMFNoGUI()\n",
"print('Done!')"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"# 4. Configuration parameters"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Input and Config files\n",
"GCC-NMF parameters can be set via a configuration file, with a default config file generated as `gccNMF.config` at first launch. This config file can either be modified, or a new config file may be specified. The input wav file may also be specified in a similar manner.\n",
"\n",
"The config and input file paths can either be specified with the following optional command line arguments,\n",
"\n",
"`$ python gccNMF/realtime/runRealtimeGCCNMF.py --input <path/to/wav/file> --config <path/to/config/file>`\n",
"\n",
"or programatically as,\n",
"\n",
"`RealtimeGCCNMF(audioPath='<path/to/wav/file>', configPath='<path/to/config/file>')`\n",
"\n",
"#### Help\n",
"\n",
"`$ python gccNMF/realtime/runRealtimeGCCNMF.py --help`\n",
"\n",
"`usage: runRealtimeGCCNMF.py [-h] [-i INPUT] [-c CONFIG] [--no-gui]`\n",
"\n",
"`Real-time GCC-NMF Speech Enhancement`\n",
"\n",
"`optional arguments:`\n",
"\n",
"`-h, --help` \n",
"*show this help message and exit* \n",
"\n",
"`-i INPUT, --input INPUT` \n",
"*input wav file path* \n",
"\n",
"`-c CONFIG, --config CONFIG` \n",
"*config file path* \n",
"\n",
"`--no-gui` \n",
"*no user interface mode*"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.1"
}
},
"nbformat": 4,
"nbformat_minor": 1
}

0 comments on commit d6da3ac

Please sign in to comment.