Skip to content

Commit 25439c3

Browse files
committed
Various updates to Microphone and recognize_google_cloud, as well as documentation surrounding those
1 parent cdb42b1 commit 25439c3

17 files changed

+153
-74
lines changed

LICENSE.txt

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
Copyright (c) 2014-2016, Anthony Zhang <[email protected]>
1+
Copyright (c) 2014-2017, Anthony Zhang <[email protected]>
22
All rights reserved.
33

44
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

README.rst

+20-14
Original file line numberDiff line numberDiff line change
@@ -61,6 +61,7 @@ See the ``examples/`` `directory <https://github.com/Uberi/speech_recognition/tr
6161
- `Show extended recognition results <https://github.com/Uberi/speech_recognition/blob/master/examples/extended_results.py>`__
6262
- `Calibrate the recognizer energy threshold for ambient noise levels <https://github.com/Uberi/speech_recognition/blob/master/examples/calibrate_energy_threshold.py>`__ (see ``recognizer_instance.energy_threshold`` for details)
6363
- `Listening to a microphone in the background <https://github.com/Uberi/speech_recognition/blob/master/examples/background_listening.py>`__
64+
- `Various other useful recognizer features <https://github.com/Uberi/speech_recognition/blob/master/examples/special_recognizer_features.py>`__
6465

6566
Installing
6667
----------
@@ -80,8 +81,8 @@ To use all of the functionality of the library, you should have:
8081

8182
* **Python** 2.6, 2.7, or 3.3+ (required)
8283
* **PyAudio** 0.2.9+ (required only if you need to use microphone input, ``Microphone``)
83-
* **google-api-python-client** (required only if you need to use the Google Cloud Speech API)
8484
* **PocketSphinx** (required only if you need to use the Sphinx recognizer, ``recognizer_instance.recognize_sphinx``)
85+
* **Google API Client Library for Python** (required only if you need to use the Google Cloud Speech API, ``recognizer_instance.recognize_google_cloud``)
8586
* **FLAC encoder** (required only if the system is not x86-based Windows/Linux/OS X)
8687

8788
The following requirements are optional, but can improve or extend functionality in some situations:
@@ -101,7 +102,7 @@ PyAudio (for microphone users)
101102

102103
`PyAudio <http://people.csail.mit.edu/hubert/pyaudio/#downloads>`__ is required if and only if you want to use microphone input (``Microphone``). PyAudio version 0.2.9+ is required, as earlier versions have overflow issues with recording on certain machines.
103104

104-
If not installed, everything in the library will still work, except attempting to instantiate a ``Microphone`` object will throw an ``AttributeError``.
105+
If not installed, everything in the library will still work, except attempting to instantiate a ``Microphone`` object will raise an ``AttributeError``.
105106

106107
The installation instructions are quite good as of PyAudio v0.2.9. For convenience, they are summarized below:
107108

@@ -113,13 +114,6 @@ The installation instructions are quite good as of PyAudio v0.2.9. For convenien
113114

114115
PyAudio `wheel packages <https://pypi.python.org/pypi/wheel>`__ for 64-bit Python 2.7, 3.4, and 3.5 on Windows and Linux are included for convenience, under the ``third-party/`` `directory <https://github.com/Uberi/speech_recognition/tree/master/third-party>`__ in the repository root. To install, simply run ``pip install wheel`` followed by ``pip install ./third-party/WHEEL_FILENAME`` (replace ``pip`` with ``pip3`` if using Python 3) in the repository `root directory <https://github.com/Uberi/speech_recognition>`__.
115116

116-
google-api-python-client (for Google Cloud Speech API users)
117-
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
118-
119-
`google-api-python-client <https://developers.google.com/api-client-library/python/>`__ is required if and only if you want to use the Google Cloud Speech API.
120-
121-
If it is not installed, ``recognize_google_cloud()`` will raise ``ImportError.``
122-
123117
PocketSphinx-Python (for Sphinx users)
124118
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
125119

@@ -133,6 +127,17 @@ Note that the versions available in most package repositories are outdated and w
133127

134128
See `Notes on using PocketSphinx <https://github.com/Uberi/speech_recognition/blob/master/reference/pocketsphinx.rst>`__ for information about installing languages, compiling PocketSphinx, and building language packs from online resources. This document is also included under ``reference/pocketsphinx.rst``.
135129

130+
Google API Client Library for Python (for Google Cloud Speech API users)
131+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
132+
133+
`Google API Client Library for Python <https://developers.google.com/api-client-library/python/>`__ is required if and only if you want to use the Google Cloud Speech API (``recognizer_instance.recognize_google_cloud``).
134+
135+
If not installed, everything in the library will still work, except calling ``recognizer_instance.recognize_google_cloud`` will raise an ``RequestError``.
136+
137+
According to the `official installation instructions <https://developers.google.com/api-client-library/python/start/installation>`__, the recommended way to install this is using `Pip <https://pip.readthedocs.org/>`__: execute ``pip install google-api-python-client`` (replace ``pip`` with ``pip3`` if using Python 3).
138+
139+
Alternatively, you can perform the installation completely offline from the source archives under the ``./third-party/Source code for Google API Client Library for Python and its dependencies/`` directory.
140+
136141
FLAC (for some systems)
137142
~~~~~~~~~~~~~~~~~~~~~~~
138143

@@ -177,10 +182,10 @@ Try setting the recognition language to your language/dialect. To do this, see t
177182

178183
For example, if your language/dialect is British English, it is better to use ``"en-GB"`` as the language rather than ``"en-US"``.
179184

180-
The code examples throw ``UnicodeEncodeError: 'ascii' codec can't encode character`` when run.
185+
The code examples raise ``UnicodeEncodeError: 'ascii' codec can't encode character`` when run.
181186
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
182187

183-
When you're using Python 2, and your language uses non-ASCII characters, and the terminal or file-like object you're printing to only supports ASCII, an error is thrown when trying to write non-ASCII characters.
188+
When you're using Python 2, and your language uses non-ASCII characters, and the terminal or file-like object you're printing to only supports ASCII, an error is raised when trying to write non-ASCII characters.
184189

185190
This is because in Python 2, ``recognizer_instance.recognize_sphinx``, ``recognizer_instance.recognize_google``, ``recognizer_instance.recognize_wit``, ``recognizer_instance.recognize_bing``, ``recognizer_instance.recognize_api``, ``recognizer_instance.recognize_houndify``, and ``recognizer_instance.recognize_ibm`` return unicode strings (``u"something"``) rather than byte strings (``"something"``). In Python 3, all strings are unicode strings.
186191

@@ -307,23 +312,24 @@ Authors
307312
kamushadenes <[email protected]> (Kamus Hadenes)
308313
sbraden <[email protected]> (Sarah Braden)
309314
tb0hdan (Bohdan Turkynewych)
315+
Thynix <[email protected]> (Steve Dougherty)
310316

311317
Please report bugs and suggestions at the `issue tracker <https://github.com/Uberi/speech_recognition/issues>`__!
312318

313319
How to cite this library (APA style):
314320

315-
Zhang, A. (2016). Speech Recognition (Version 3.5) [Software]. Available from https://github.com/Uberi/speech_recognition#readme.
321+
Zhang, A. (2017). Speech Recognition (Version 3.5) [Software]. Available from https://github.com/Uberi/speech_recognition#readme.
316322

317323
How to cite this library (Chicago style):
318324

319-
Zhang, Anthony. 2016. *Speech Recognition* (version 3.5).
325+
Zhang, Anthony. 2017. *Speech Recognition* (version 3.5).
320326

321327
Also check out the `Python Baidu Yuyin API <https://github.com/DelightRun/PyBaiduYuyin>`__, which is based on an older version of this project, and adds support for `Baidu Yuyin <http://yuyin.baidu.com/>`__. Note that Baidu Yuyin is only available inside China.
322328

323329
License
324330
-------
325331

326-
Copyright 2014-2016 `Anthony Zhang (Uberi) <https://uberi.github.io>`__. The source code for this library is available online at `GitHub <https://github.com/Uberi/speech_recognition>`__.
332+
Copyright 2014-2017 `Anthony Zhang (Uberi) <https://uberi.github.io>`__. The source code for this library is available online at `GitHub <https://github.com/Uberi/speech_recognition>`__.
327333

328334
SpeechRecognition is made available under the 3-clause BSD license. See ``LICENSE.txt`` in the project's `root directory <https://github.com/Uberi/speech_recognition>`__ for more information.
329335

examples/audio_transcribe.py

+9
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,15 @@
3232
except sr.RequestError as e:
3333
print("Could not request results from Google Speech Recognition service; {0}".format(e))
3434

35+
# recognize speech using Google Cloud Speech
36+
GOOGLE_CLOUD_SPEECH_CREDENTIALS = r"""INSERT THE CONTENTS OF THE GOOGLE CLOUD SPEECH JSON CREDENTIALS FILE HERE"""
37+
try:
38+
print("Google Cloud Speech thinks you said " + r.recognize_google_cloud(audio, credentials_json=GOOGLE_CLOUD_SPEECH_CREDENTIALS))
39+
except sr.UnknownValueError:
40+
print("Google Cloud Speech could not understand audio")
41+
except sr.RequestError as e:
42+
print("Could not request results from Google Cloud Speech service; {0}".format(e))
43+
3544
# recognize speech using Wit.ai
3645
WIT_AI_KEY = "INSERT WIT.AI API KEY HERE" # Wit.ai keys are 32-character uppercase alphanumeric strings
3746
try:

examples/extended_results.py

+10
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,16 @@
3535
except sr.RequestError as e:
3636
print("Could not request results from Google Speech Recognition service; {0}".format(e))
3737

38+
# recognize speech using Google Cloud Speech
39+
GOOGLE_CLOUD_SPEECH_CREDENTIALS = r"""INSERT THE CONTENTS OF THE GOOGLE CLOUD SPEECH JSON CREDENTIALS FILE HERE"""
40+
try:
41+
print("Google Cloud Speech recognition results:")
42+
pprint(r.recognize_google_cloud(audio, credentials_json=GOOGLE_CLOUD_SPEECH_CREDENTIALS, show_all=True)) # pretty-print the recognition result
43+
except sr.UnknownValueError:
44+
print("Google Cloud Speech could not understand audio")
45+
except sr.RequestError as e:
46+
print("Could not request results from Google Cloud Speech service; {0}".format(e))
47+
3848
# recognize speech using Wit.ai
3949
WIT_AI_KEY = "INSERT WIT.AI API KEY HERE" # Wit.ai keys are 32-character uppercase alphanumeric strings
4050
try:

examples/microphone_recognition.py

+9
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,15 @@
2929
except sr.RequestError as e:
3030
print("Could not request results from Google Speech Recognition service; {0}".format(e))
3131

32+
# recognize speech using Google Cloud Speech
33+
GOOGLE_CLOUD_SPEECH_CREDENTIALS = r"""INSERT THE CONTENTS OF THE GOOGLE CLOUD SPEECH JSON CREDENTIALS FILE HERE"""
34+
try:
35+
print("Google Cloud Speech thinks you said " + r.recognize_google_cloud(audio, credentials_json=GOOGLE_CLOUD_SPEECH_CREDENTIALS))
36+
except sr.UnknownValueError:
37+
print("Google Cloud Speech could not understand audio")
38+
except sr.RequestError as e:
39+
print("Could not request results from Google Cloud Speech service; {0}".format(e))
40+
3241
# recognize speech using Wit.ai
3342
WIT_AI_KEY = "INSERT WIT.AI API KEY HERE" # Wit.ai keys are 32-character uppercase alphanumeric strings
3443
try:
+36
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
#!/usr/bin/env python3
2+
3+
import speech_recognition as sr
4+
5+
from os import path
6+
AUDIO_FILE_EN = path.join(path.dirname(path.realpath(__file__)), "english.wav")
7+
AUDIO_FILE_FR = path.join(path.dirname(path.realpath(__file__)), "french.aiff")
8+
9+
# use the audio file as the audio source
10+
r = sr.Recognizer()
11+
with sr.AudioFile(AUDIO_FILE_EN) as source:
12+
audio_en = r.record(source) # read the entire audio file
13+
with sr.AudioFile(AUDIO_FILE_FR) as source:
14+
audio_fr = r.record(source) # read the entire audio file
15+
16+
# recognize keywords using Sphinx
17+
try:
18+
print("Sphinx recognition for \"one two three\" with different sets of keywords:")
19+
print(r.recognize_sphinx(audio_en, keyword_entries=[("one", 1.0), ("two", 1.0), ("three", 1.0)]))
20+
print(r.recognize_sphinx(audio_en, keyword_entries=[("wan", 0.95), ("too", 1.0), ("tree", 1.0)]))
21+
print(r.recognize_sphinx(audio_en, keyword_entries=[("un", 0.95), ("to", 1.0), ("tee", 1.0)]))
22+
except sr.UnknownValueError:
23+
print("Sphinx could not understand audio")
24+
except sr.RequestError as e:
25+
print("Sphinx error; {0}".format(e))
26+
27+
# recognize preferred phrases using Google Cloud Speech
28+
GOOGLE_CLOUD_SPEECH_CREDENTIALS = r"""INSERT THE CONTENTS OF THE GOOGLE CLOUD SPEECH JSON CREDENTIALS FILE HERE"""
29+
try:
30+
print("Google Cloud Speech recognition for \"numero\" with different sets of preferred phrases:")
31+
print(r.recognize_google_cloud(audio_fr, credentials_json=GOOGLE_CLOUD_SPEECH_CREDENTIALS, preferred_phrases=["noomarow"]))
32+
print(r.recognize_google_cloud(audio_fr, credentials_json=GOOGLE_CLOUD_SPEECH_CREDENTIALS, preferred_phrases=["newmarrow"]))
33+
except sr.UnknownValueError:
34+
print("Google Cloud Speech could not understand audio")
35+
except sr.RequestError as e:
36+
print("Could not request results from Google Cloud Speech service; {0}".format(e))

reference/library-reference.rst

+16-1
Original file line numberDiff line numberDiff line change
@@ -198,12 +198,27 @@ The Google Speech Recognition API key is specified by ``key``. If not specified,
198198

199199
To obtain your own API key, simply follow the steps on the `API Keys <http://www.chromium.org/developers/how-tos/api-keys>`__ page at the Chromium Developers site. In the Google Developers Console, Google Speech Recognition is listed as "Speech API". Note that **the API quota for your own keys is 50 requests per day**, and there is currently no way to raise this limit.
200200

201-
The recognition language is determined by ``language``, an IETF language tag like ``"en-US"`` or ``"en-GB"``, defaulting to US English. A list of supported language codes can be found `here <http://stackoverflow.com/questions/14257598/what-are-language-codes-for-voice-recognition-languages-in-chromes-implementati>`__. Basically, language codes can be just the language (``en``), or a language with a dialect (``en-US``).
201+
The recognition language is determined by ``language``, an IETF language tag like ``"en-US"`` or ``"en-GB"``, defaulting to US English. A list of supported language tags can be found `here <http://stackoverflow.com/questions/14257598/what-are-language-codes-for-voice-recognition-languages-in-chromes-implementati>`__. Basically, language codes can be just the language (``en``), or a language with a dialect (``en-US``).
202202

203203
Returns the most likely transcription if ``show_all`` is false (the default). Otherwise, returns the raw API response as a JSON dictionary.
204204

205205
Raises a ``speech_recognition.UnknownValueError`` exception if the speech is unintelligible. Raises a ``speech_recognition.RequestError`` exception if the speech recognition operation failed, if the key isn't valid, or if there is no internet connection.
206206

207+
``recognizer_instance.recognize_google_cloud(audio_data, credentials_json_file_path = None, language = "en-US", preferred_phrases = None, show_all = False)``
208+
------------------------------------------------------------------------------------------------------
209+
210+
Performs speech recognition on ``audio_data`` (an ``AudioData`` instance), using the Google Cloud Speech API.
211+
212+
This function requires a Google Cloud Platform account; see the `Google Cloud Speech API Quickstart <https://cloud.google.com/speech/docs/getting-started>`__ for details and instructions. Basically, create a project, enable billing for the project, enable the Google Cloud Speech API for the project, and set up Service Account Key credentials for the project. The result is a JSON file containing the API credentials. The text content of this JSON file is specified by ``credentials_json``. If not specified, the library will try to automatically `find the default API credentials JSON file <https://developers.google.com/identity/protocols/application-default-credentials>`__.
213+
214+
The recognition language is determined by ``language``, which is a BCP-47 language tag like ``"en-US"`` (US English). A list of supported language tags can be found in the `Google Cloud Speech API documentation <https://cloud.google.com/speech/docs/languages>`__.
215+
216+
If ``preferred_phrases`` is a list of phrase strings, those given phrases will be more likely to be recognized over similar-sounding alternatives. This is useful for things like keyword/command recognition or adding new phrases that aren't in Google's vocabulary. Note that the API imposes certain `restrictions on the list of phrase strings <https://cloud.google.com/speech/limits#content>`__.
217+
218+
Returns the most likely transcription if ``show_all`` is False (the default). Otherwise, returns the raw API response as a JSON dictionary.
219+
220+
Raises a ``speech_recognition.UnknownValueError`` exception if the speech is unintelligible. Raises a ``speech_recognition.RequestError`` exception if the speech recognition operation failed, if the credentials aren't valid, or if there is no Internet connection.
221+
207222
``recognizer_instance.recognize_wit(audio_data, key, show_all = False)``
208223
------------------------------------------------------------------------
209224

0 commit comments

Comments
 (0)