Skip to content

Commit

Permalink
Merge pull request syl22-00#77 from jenweber/keyword-detection-demo
Browse files Browse the repository at this point in the history
Fine tune keyword detection demo and documentation
  • Loading branch information
syl22-00 authored Dec 5, 2016
2 parents 67cf722 + e2b5408 commit 6a457cd
Show file tree
Hide file tree
Showing 2 changed files with 17 additions and 13 deletions.
22 changes: 13 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ $ git submodule update

You will need:

* [emscripten](https://github.com/kripken/emscripten) (which implies also node.js and LLVM-fastcomp compiler, see emscripten docs for instructions on how to get it),
* [emscripten](https://github.com/kripken/emscripten) (which implies also node.js and LLVM-fastcomp compiler, see emscripten docs for instructions on how to get it),
* [CMake](http://www.cmake.org/).

The build is a classic CMake cross-compilation, using the toolchain provided by emscripten:
Expand Down Expand Up @@ -86,7 +86,7 @@ Make sure the files of the acoustic model are directly inside the `HMM_FOLDERS`:
$ ls *
model1:
feat.params mdef means sendump transition_matrices variances

model2:
feat.params mdef means sendump transition_matrices variances

Expand Down Expand Up @@ -131,7 +131,7 @@ You can interact with `pocketsphinx.js` directly if you need to, but it is proba

The file `pocketsphinx.js` can be directly included into an HTML file but as it is fairly large (a few MB, depending on the optimization level used during compilation and packaged files), downloading and loading it will take time and affect the UI thread. So, as explained later, you should use it inside a Web worker, for instance using `recognizer.js`.

This API is based on `embind`, you should probably have a look at that [section in emscripten's docs](https://github.com/kripken/emscripten/wiki/embind) to understand how to interact with emscripten-generated JavaScript. Earlier versions of Pocketsphinx.js used a C-style API which is now deprecated, but it is still available in the `OBSOLETE_API` branch.
This API is based on `embind`, you should probably have a look at that [section in emscripten's docs](https://github.com/kripken/emscripten/wiki/embind) to understand how to interact with emscripten-generated JavaScript. Earlier versions of Pocketsphinx.js used a C-style API which is now deprecated, but it is still available in the `OBSOLETE_API` branch.


As a first example, to create a new recognizer:
Expand Down Expand Up @@ -281,7 +281,11 @@ var id = ids.get(0); // This is the id assigned to the search
ids.delete();
```

Note that there is a threshold that can be set to define how sensitive the search is. Add `["-kws_threshold", "2"]` for instance to the config object. Default value is 1, higher values means more likely to be spotted and more likely to get false positives.
Note that there is a threshold that can be set to define how sensitive the search is. Add `["-kws_threshold", "1e-35"]` for instance to the config object. Values like "1e-50" mean that the keyword is more likely to be spotted but more likely to get false positives, while "1e-0" is restrictive and may miss actual keyword utterances. Experiment to find the ideal threshold. It varies greatly depending on the keyword itself, audio quality, and background noise.

```javascript
{command: 'initialize', data: [["-kws_threshold", "1e-35"]]}
```
### d. Switching between grammars or keyword searches
Expand Down Expand Up @@ -452,7 +456,7 @@ Note that words can have several pronunciation alternatives as explained in Sect
### d. Adding grammars or key phrases
As described previously, any number of grammars or keyword searches can be added. The recognizer can then switch between them.
As described previously, any number of grammars or keyword searches can be added. The recognizer can then switch between them.
A grammar can be added at once using a JavaScript object that contains the number of states, the first and last states, and an array of transitions, for instance:
Expand Down Expand Up @@ -481,7 +485,7 @@ var keyphrase = "HELLO WORLD";
recognizer.postMessage({command: 'addKeyword', data: keyphrase, callbackId: id});
```
Just as like with grammars, words should already be in the recognizer, and the id of the newly added search is given in the callback. As explained previously, you might want to ajust the sensitivity threshold when initializing the recognizer, for example with providing `["-kws_threshold", "2"]`.
Just as like with grammars, words should already be in the recognizer, and the id of the newly added search is given in the callback. As explained previously, you might want to ajust the sensitivity threshold when initializing the recognizer, for example with providing `["-kws_threshold", "1e-35"]`.
### e. Starting recognition
Expand All @@ -507,7 +511,7 @@ While data are processed, hypothesis will be sent back in a message in the form
### g. Ending recognition
Recognition can be simply stopped using the `stop` command:
Recognition can be simply stopped using the `stop` command:
```javascript
recognizer.postMessage({command: 'stop'});
Expand Down Expand Up @@ -662,7 +666,7 @@ function startup(onMessage) {
};
// This function is called first, it triggers
// a first postmessage, then adds the proper respond to
// commands:
// commands:
startup(function(event) {
switch(event.data.command){
//We deal with commands properly
Expand Down Expand Up @@ -770,7 +774,7 @@ PocketSphinx.js now uses PocketSphinx (and Sphinxbase) code as it is in its gith
# 9. License
PocketSphinx licensing terms are included in the `pocketsphinx` and `sphinxbase` folders.
PocketSphinx licensing terms are included in the `pocketsphinx` and `sphinxbase` folders.
The files `webapp/js/audioRecorder.js` and `webapp/js/audioRecorderWorker.js` are based on [Recorder.js](https://github.com/mattdiamond/Recorderjs), which is under the MIT license (Copyright © 2013 Matt Diamond).
Expand Down
8 changes: 4 additions & 4 deletions webapp/live_kws.html
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ <h2>Status</h2>
function startUserMedia(stream) {
var input = audioContext.createMediaStreamSource(stream);
// Firefox hack https://support.mozilla.org/en-US/questions/984179
window.firefox_audio_hack = input;
window.firefox_audio_hack = input;
var audioRecorderConfig = {errorCallback: function(x) {updateStatus("Error from recorder: " + x);}};
recorder = new AudioRecorder(input, audioRecorderConfig);
// If a recognizer is ready, we pass it to the recorder
Expand Down Expand Up @@ -111,7 +111,7 @@ <h2>Status</h2>
newElt.value=keywordIds[i].id;
newElt.innerHTML = keywordIds[i].title;
selectTag.appendChild(newElt);
}
}
};

// This adds a keyword search from the array
Expand Down Expand Up @@ -139,7 +139,7 @@ <h2>Status</h2>
// This initializes the recognizer. When it calls back, we add words
var initRecognizer = function() {
// You can pass parameters to the recognizer, such as : {command: 'initialize', data: [["-hmm", "my_model"], ["-fwdflat", "no"]]}
postRecognizerJob({command: 'initialize', data: [["-kws_threshold", "2"]]},
postRecognizerJob({command: 'initialize', data: [["-kws_threshold", "1e-25"]]},
function() {
if (recorder) recorder.consumers = [recognizer];
feedWords(wordList);});
Expand Down Expand Up @@ -202,7 +202,7 @@ <h2>Status</h2>
// This is the list of words that need to be added to the recognizer
// This follows the CMU dictionary format
var wordList = [["ONE", "W AH N"], ["TWO", "T UW"], ["THREE", "TH R IY"], ["FOUR", "F AO R"], ["FIVE", "F AY V"], ["SIX", "S IH K S"], ["SEVEN", "S EH V AH N"], ["EIGHT", "EY T"], ["NINE", "N AY N"], ["ZERO", "Z IH R OW"], ["NEW-YORK", "N UW Y AO R K"], ["NEW-YORK-CITY", "N UW Y AO R K S IH T IY"], ["PARIS", "P AE R IH S"] , ["PARIS(2)", "P EH R IH S"], ["SHANGHAI", "SH AE NG HH AY"], ["SAN-FRANCISCO", "S AE N F R AE N S IH S K OW"], ["LONDON", "L AH N D AH N"], ["BERLIN", "B ER L IH N"], ["SUCKS", "S AH K S"], ["ROCKS", "R AA K S"], ["IS", "IH Z"], ["NOT", "N AA T"], ["GOOD", "G IH D"], ["GOOD(2)", "G UH D"], ["GREAT", "G R EY T"], ["WINDOWS", "W IH N D OW Z"], ["LINUX", "L IH N AH K S"], ["UNIX", "Y UW N IH K S"], ["MAC", "M AE K"], ["AND", "AE N D"], ["AND(2)", "AH N D"], ["O", "OW"], ["S", "EH S"], ["X", "EH K S"]];
var keywords = [{title: "ONE", g: "ONE"}, {title: "TWO", g: "TWO"}, {title: "NEW-YORK", g: "NEW-YORK"}];
var keywords = [{title: "SIX", g: "SIX"}, {title: "ROCKS", g: "ROCKS"}, {title: "GREAT", g: "GREAT"}];
var keywordIds = [];
</script>
<!-- These are the two JavaScript files you must load in the HTML,
Expand Down

0 comments on commit 6a457cd

Please sign in to comment.