Skip to content

Commit 481f18b

Browse files
committed
Add some documentation and a useful helper function
1 parent 77f2925 commit 481f18b

File tree

4 files changed

+87
-36
lines changed

4 files changed

+87
-36
lines changed

.gitignore

+5
Original file line numberDiff line numberDiff line change
@@ -3,3 +3,8 @@ build
33
dist
44
__pycache__
55
*.pyc
6+
speech_recognition/pocketsphinx-data/fr-FR/
7+
speech_recognition/pocketsphinx-data/zh-CN/
8+
fr-FR.zip
9+
zh-CN.zip
10+
pocketsphinx-python/

README.rst

+40-9
Original file line numberDiff line numberDiff line change
@@ -86,18 +86,31 @@ The installation instructions are quite good as of PyAudio v0.2.9. For convenien
8686
* On OS X, install PortAudio using `Homebrew <http://brew.sh/>`__: ``brew install portaudio``. Then, install PyAudio using `Pip <https://pip.readthedocs.org/>`__: ``pip install pyaudio``.
8787
* On other POSIX-based systems, install the ``portaudio19-dev`` and ``python-all-dev`` (or ``python3-all-dev`` if using Python 3) packages (or their closest equivalents) using a package manager of your choice, and then install PyAudio using `Pip <https://pip.readthedocs.org/>`__: ``pip install pyaudio`` (replace ``pip`` with ``pip3`` if using Python 3).
8888

89-
PyAudio `wheel packages <https://pypi.python.org/pypi/wheel>`__ for 64-bit Python 2.7, 3.4, and 3.5 on Windows and Linux are included for convenience. To install, simply run ``pip install wheel`` followed by ``pip install ./third-party/WHEEL_FILENAME`` (replace ``pip`` with ``pip3`` if using Python 3) in the SpeechRecognition folder.
89+
PyAudio `wheel packages <https://pypi.python.org/pypi/wheel>`__ for 64-bit Python 2.7, 3.4, and 3.5 on Windows and Linux are included for convenience, under the ``third-party/`` directory. To install, simply run ``pip install wheel`` followed by ``pip install ./third-party/WHEEL_FILENAME`` (replace ``pip`` with ``pip3`` if using Python 3) in the SpeechRecognition folder.
9090

9191
PocketSphinx-Python (for Sphinx users)
9292
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
9393

94-
If you want to use the Sphinx recognizer, `PocketSphinx-Python <https://github.com/bambocher/pocketsphinx-python>`__ is required. If not installed, calling ``recognizer_instance.recognize_sphinx`` will fail.
94+
`PocketSphinx-Python <https://github.com/bambocher/pocketsphinx-python>`__ is required if and only if you want to use the Sphinx recognizer (``recognizer_instance.recognize_sphinx``).
9595

96-
PocketSphinx-Python `wheel packages <https://pypi.python.org/pypi/wheel>`__ for 64-bit Python 2.7, 3.4, and 3.5 on Windows and Linux are included for convenience. To install, simply run ``pip install wheel`` followed by ``pip install ./third-party/WHEEL_FILENAME`` (replace ``pip`` with ``pip3`` if using Python 3) in the SpeechRecognition folder.
96+
PocketSphinx-Python `wheel packages <https://pypi.python.org/pypi/wheel>`__ for 64-bit Python 2.7, 3.4, and 3.5 on Windows and Linux are included for convenience, under the ``third-party/`` directory. To install, simply run ``pip install wheel`` followed by ``pip install ./third-party/WHEEL_FILENAME`` (replace ``pip`` with ``pip3`` if using Python 3) in the SpeechRecognition folder.
9797

9898
Note that the versions available in most package repositories are outdated and will not work with the bundled language data. Using the bundled wheel packages or building from source is recommended.
9999

100-
To build PocketSphinx-Python from source:
100+
Installing other languages
101+
^^^^^^^^^^^^^^^^^^^^^^^^^^
102+
103+
By default, SpeechRecognition's Sphinx functionality supports only US English. Additional language packs are also available, but not included due to the files being too large:
104+
105+
* `Metropolitan French <https://db.tt/tVNcZXao>`__
106+
* `Mandarin Chinese <https://db.tt/2YQVXmEk>`__
107+
108+
To install a language pack, download the ZIP archives and extract them directly into the module install directory (you can find the module install directory by running ``python -c "import speech_recognition as sr, os.path as p; print(p.dirname(sr.__file__))"``).
109+
110+
Once installed, you can simply specify the language using the ``language`` parameter of ``recognizer_instance.recognize_sphinx``. For example, French would be specified with ``"fr-FR"`` and Mandarin with ``"zh-CN"``.
111+
112+
Building PocketSphinx-Python from source
113+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
101114

102115
* On Windows:
103116
1. Install `Python <https://www.python.org/downloads/>`__, `Pip <https://pip.pypa.io/en/stable/installing/>`__, `SWIG <http://www.swig.org/download.html>`__, and `Git <https://git-scm.com/downloads>`__, preferably using a package manager.
@@ -120,10 +133,12 @@ To build PocketSphinx-Python from source:
120133

121134
To build an installable `wheel package <https://pypi.python.org/pypi/wheel>`__ (like the ones included with this project) instead of just installing, run ``git clone --recursive https://github.com/bambocher/pocketsphinx-python && cd pocketsphinx-python && python setup.py bdist_wheel`` instead of ``pip install pocketsphinx``/``python setup.py install``. The resulting Wheel will be found in the ``dist`` folder of the PocketSphinx-Python project directory.
122135

123-
Notes on the structure of the language data:
136+
Notes on the structure of the language data
137+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
124138

125139
* Every language has its own folder under ``/speech_recognition/pocketsphinx-data/LANGUAGE_NAME/``, where ``LANGUAGE_NAME`` is the IETF language tag, like ``"en-US"`` (US English) or ``"en-GB"`` (UK English).
126140
* For example, the US English data is stored in ``/speech_recognition/pocketsphinx-data/en-US/``.
141+
* The ``language`` parameter of ``recognizer_instance.recognize_sphinx`` simply chooses the folder with the given name.
127142
* Languages are composed of 3 parts:
128143
* An acoustic model ``/speech_recognition/pocketsphinx-data/LANGUAGE_NAME/acoustic-model/``, which describes how to interpret audio data.
129144
* Acoustic models can be downloaded from the `CMU Sphinx files <http://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/>`__. These are pretty disorganized, but instructions for cleaning up specific versions are listed below.
@@ -299,6 +314,22 @@ Instances of this class are context managers, and are designed to be used with `
299314
pass # do things here - ``source`` is the Microphone instance created above
300315
# the microphone is automatically released at this point
301316
317+
``Microphone.list_microphone_names()``
318+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
319+
320+
Returns a list of the names of all available microphones. For microphones where the name can't be retrieved, the list entry contains ``None`` instead.
321+
322+
The index of each microphone's name is the same as its device index when creating a ``Microphone`` instance - indices in this list can be used as values of ``device_index``.
323+
324+
To create a ``Microphone`` instance by name:
325+
326+
.. code:: python
327+
328+
m = None
329+
for microphone_name in Microphone.list_microphone_names():
330+
if microphone_name == "HDA Intel HDMI: 0 (hw:0,3)":
331+
m = Microphone(i)
332+
302333
``WavFile(filename_or_fileobject)``
303334
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
304335

@@ -421,7 +452,7 @@ The ``callback`` parameter is a function that should accept two parameters - the
421452

422453
Performs speech recognition on ``audio_data`` (an ``AudioData`` instance), using CMU Sphinx.
423454

424-
The recognition language is determined by ``language``, an IETF language tag like ``"en-US"`` or ``"en-GB"``, defaulting to US English. By default, only ``en-US`` is supported. Additional languages can be installed from ;wip
455+
The recognition language is determined by ``language``, an IETF language tag like ``"en-US"`` or ``"en-GB"``, defaulting to US English. Out of the box, only ``en-US`` is supported. See the "Installing other languages" section in the README for information about additional language packs.
425456

426457
Returns the most likely transcription if ``show_all`` is false (the default). Otherwise, returns the Sphinx ``pocketsphinx.pocketsphinx.Hypothesis`` object generated by Sphinx.
427458

@@ -434,7 +465,7 @@ Performs speech recognition on ``audio_data`` (an ``AudioData`` instance), using
434465

435466
The Google Speech Recognition API key is specified by ``key``. If not specified, it uses a generic key that works out of the box. This should generally be used for personal or testing purposes only, as it **may be revoked by Google at any time**.
436467

437-
To obtain your own API key, simply follow the steps on the `API Keys <http://www.chromium.org/developers/how-tos/api-keys>`__ page at the Chromium Developers site. In the Google Developers Console, Google Speech Recognition is listed as "Speech API". Note that **the API quota is 50 requests per day**, and there is currently no way to raise this limit.
468+
To obtain your own API key, simply follow the steps on the `API Keys <http://www.chromium.org/developers/how-tos/api-keys>`__ page at the Chromium Developers site. In the Google Developers Console, Google Speech Recognition is listed as "Speech API". Note that **the API quota for your own keys is 50 requests per day**, and there is currently no way to raise this limit.
438469

439470
The recognition language is determined by ``language``, an IETF language tag like ``"en-US"`` or ``"en-GB"``, defaulting to US English. A list of supported language codes can be found `here <http://stackoverflow.com/questions/14257598/>`__. Basically, language codes can be just the language (``en``), or a language with a dialect (``en-US``).
440471

@@ -466,7 +497,7 @@ Performs speech recognition on ``audio_data`` (an ``AudioData`` instance), using
466497

467498
The IBM Speech to Text username and password are specified by ``username`` and ``password``, respectively. Unfortunately, these are not available without an account. IBM has published instructions for obtaining these credentials in the `IBM Watson Developer Cloud documentation <https://www.ibm.com/smarterplanet/us/en/ibmwatson/developercloud/doc/getting_started/gs-credentials.shtml>`__.
468499

469-
The recognition language is determined by ``language``, an IETF language tag with a dialect like ``"en-US"`` or ``"es-ES"``, defaulting to US English. At the moment, this supports the tags ``"en-US"``, ``"es-ES"``, ``"pt-BR"``, and ``"zh-CN"``.
500+
The recognition language is determined by ``language``, an IETF language tag with a dialect like ``"en-US"`` or ``"es-ES"``, defaulting to US English. At the moment, this supports the tags ``"en-US"`` and ``"es-ES"``.
470501

471502
Returns the most likely transcription if ``show_all`` is false (the default). Otherwise, returns the `raw API response <http://www.ibm.com/smarterplanet/us/en/ibmwatson/developercloud/speech-to-text/api/v1/#recognize>`__ as a JSON dictionary.
472503

@@ -481,7 +512,7 @@ The AT&T Speech to Text app key and app secret are specified by ``app_key`` and
481512

482513
To get the app key and app secret for an AT&T app, go to the `My Apps page <https://matrix.bf.sl.attcompute.com/apps>`__ and look for "APP KEY" and "APP SECRET". AT&T app keys and app secrets are 32-character lowercase alphanumeric strings.
483514

484-
The recognition language is determined by ``language``, an IETF language tag with a dialect like ``"en-US"`` or ``"es-ES"``, defaulting to US English. At the moment, this supports the tags ``"en-US"``, ``"es-ES"``, and ``"ja-JP"``.
515+
The recognition language is determined by ``language``, an IETF language tag with a dialect like ``"en-US"`` or ``"es-ES"``, defaulting to US English. At the moment, this supports the tags ``"en-US"`` and ``"es-ES"``.
485516

486517
Returns the most likely transcription if ``show_all`` is false (the default). Otherwise, returns the `raw API response <https://developer.att.com/apis/speech/docs#resources-speech-to-text>`__ as a JSON dictionary.
487518

speech_recognition/__init__.py

+24-8
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
"""Library for performing speech recognition with support for Google Speech Recognition, Wit.ai, IBM Speech to Text, and AT&T Speech to Text."""
44

55
__author__ = "Anthony Zhang (Uberi)"
6-
__version__ = "4.0.0"
6+
__version__ = "3.2.0"
77
__license__ = "BSD"
88

99
import io, os, subprocess, wave, base64
@@ -66,6 +66,21 @@ def __init__(self, device_index = None, sample_rate = 16000, chunk_size = 1024):
6666
self.audio = None
6767
self.stream = None
6868

69+
@staticmethod
70+
def list_microphone_names():
71+
"""
72+
Returns a list of the names of all available microphones. For microphones where the name can't be retrieved, the list entry contains ``None`` instead.
73+
74+
The index of each microphone's name is the same as its device index when creating a ``Microphone`` instance - indices in this list can be used as values of ``device_index``.
75+
"""
76+
audio = pyaudio.PyAudio()
77+
result = []
78+
for i in range(audio.get_device_count()):
79+
device_info = audio.get_device_info_by_index(i)
80+
result.append(device_info.get("name"))
81+
audio.terminate()
82+
return result
83+
6984
def __enter__(self):
7085
assert self.stream is None, "This audio source is already inside a context manager"
7186
self.audio = pyaudio.PyAudio()
@@ -409,9 +424,9 @@ def recognize_sphinx(self, audio_data, language = "en-US", show_all = False):
409424
"""
410425
Performs speech recognition on ``audio_data`` (an ``AudioData`` instance), using CMU Sphinx.
411426
412-
The recognition language is determined by ``language``, an IETF language tag like ``"en-US"`` or ``"en-GB"``, defaulting to US English. By default, only ``en-US`` is supported. Additional languages can be installed from ;wip
427+
The recognition language is determined by ``language``, an IETF language tag like ``"en-US"`` or ``"en-GB"``, defaulting to US English. Out of the box, only ``en-US`` is supported. See the "Installing other languages" section in the README for information about additional language packs.
413428
414-
Returns the most likely transcription if ``show_all`` is false (the default). Otherwise, returns the Sphinx ``pocketsphinx.pocketsphinx.Hypothesis`` object generated by Sphinx.
429+
Returns the most likely transcription if ``show_all`` is false (the default). Otherwise, returns the Sphinx ``pocketsphinx.pocketsphinx.Decoder`` object resulting from the recognition.
415430
416431
Raises a ``speech_recognition.UnknownValueError`` exception if the speech is unintelligible. Raises a ``speech_recognition.RequestError`` exception if there are any issues with the Sphinx installation.
417432
"""
@@ -452,11 +467,12 @@ def recognize_sphinx(self, audio_data, language = "en-US", show_all = False):
452467
# obtain recognition results
453468
decoder.start_utt() # begin utterance processing
454469
decoder.process_raw(raw_data, False, True) # process audio data with recognition enabled (no_search = False), as a full utterance (full_utt = True)
455-
hypothesis = decoder.hyp()
456470
decoder.end_utt() # stop utterance processing
457471

472+
if show_all: return decoder
473+
458474
# return results
459-
if show_all: return hypothesis
475+
hypothesis = decoder.hyp()
460476
if hypothesis is not None: return hypothesis.hypstr
461477
raise UnknownValueError() # no transcriptions available
462478

@@ -556,7 +572,7 @@ def recognize_ibm(self, audio_data, username, password, language = "en-US", show
556572
557573
The IBM Speech to Text username and password are specified by ``username`` and ``password``, respectively. Unfortunately, these are not available without an account. IBM has published instructions for obtaining these credentials in the `IBM Watson Developer Cloud documentation <https://www.ibm.com/smarterplanet/us/en/ibmwatson/developercloud/doc/getting_started/gs-credentials.shtml>`__.
558574
559-
The recognition language is determined by ``language``, an IETF language tag with a dialect like ``"en-US"`` or ``"es-ES"``, defaulting to US English. At the moment, this supports the tags ``"en-US"``, ``"es-ES"``, ``"pt-BR"``, and ``"zh-CN"``.
575+
The recognition language is determined by ``language``, an IETF language tag with a dialect like ``"en-US"`` or ``"es-ES"``, defaulting to US English. At the moment, this supports the tags ``"en-US"`` and ``"es-ES"``.
560576
561577
Returns the most likely transcription if ``show_all`` is false (the default). Otherwise, returns the `raw API response <http://www.ibm.com/smarterplanet/us/en/ibmwatson/developercloud/speech-to-text/api/v1/#recognize>`__ as a JSON dictionary.
562578
@@ -565,7 +581,7 @@ def recognize_ibm(self, audio_data, username, password, language = "en-US", show
565581
assert isinstance(audio_data, AudioData), "Data must be audio data"
566582
assert isinstance(username, str), "`username` must be a string"
567583
assert isinstance(password, str), "`password` must be a string"
568-
assert language in ["en-US", "es-ES", "pt-BR", "zh-CN"], "`language` must be a valid language."
584+
assert language in ["en-US", "es-ES"], "`language` must be a valid language."
569585

570586
flac_data = audio_data.get_flac_data(
571587
convert_rate = None if audio_data.sample_rate >= 16000 else 16000 # audio samples should be at least 16 kHz
@@ -603,7 +619,7 @@ def recognize_att(self, audio_data, app_key, app_secret, language = "en-US", sho
603619
604620
To get the app key and app secret for an AT&T app, go to the `My Apps page <https://matrix.bf.sl.attcompute.com/apps>`__ and look for "APP KEY" and "APP SECRET". AT&T app keys and app secrets are 32-character lowercase alphanumeric strings.
605621
606-
The recognition language is determined by ``language``, an IETF language tag with a dialect like ``"en-US"`` or ``"es-ES"``, defaulting to US English. At the moment, this supports the tags ``"en-US"``, ``"es-ES"``, and ``"ja-JP"``.
622+
The recognition language is determined by ``language``, an IETF language tag with a dialect like ``"en-US"`` or ``"es-ES"``, defaulting to US English. At the moment, this supports the tags ``"en-US"`` and ``"es-ES"``.
607623
608624
Returns the most likely transcription if ``show_all`` is false (the default). Otherwise, returns the `raw API response <https://developer.att.com/apis/speech/docs#resources-speech-to-text>`__ as a JSON dictionary.
609625

speech_recognition/__main__.py

+18-19
Original file line numberDiff line numberDiff line change
@@ -5,25 +5,24 @@
55

66
try:
77
print("A moment of silence, please...")
8-
with m as source:
9-
r.adjust_for_ambient_noise(source)
10-
print("Set minimum energy threshold to {}".format(r.energy_threshold))
11-
while True:
12-
print("Say something!")
13-
audio = r.listen(source)
14-
print("Got it! Now to recognize it...")
15-
try:
16-
# recognize speech using Google Speech Recognition
17-
value = r.recognize_google(audio)
8+
with m as source: r.adjust_for_ambient_noise(source)
9+
print("Set minimum energy threshold to {}".format(r.energy_threshold))
10+
while True:
11+
print("Say something!")
12+
with m as source: audio = r.listen(source)
13+
print("Got it! Now to recognize it...")
14+
try:
15+
# recognize speech using Google Speech Recognition
16+
value = r.recognize_google(audio)
1817

19-
# we need some special handling here to correctly print unicode characters to standard output
20-
if str is bytes: # this version of Python uses bytes for strings (Python 2)
21-
print(u"You said {}".format(value).encode("utf-8"))
22-
else: # this version of Python uses unicode for strings (Python 3+)
23-
print("You said {}".format(value))
24-
except sr.UnknownValueError:
25-
print("Oops! Didn't catch that")
26-
except sr.RequestError as e:
27-
print("Uh oh! Couldn't request results from Google Speech Recognition service; {0}".format(e))
18+
# we need some special handling here to correctly print unicode characters to standard output
19+
if str is bytes: # this version of Python uses bytes for strings (Python 2)
20+
print(u"You said {}".format(value).encode("utf-8"))
21+
else: # this version of Python uses unicode for strings (Python 3+)
22+
print("You said {}".format(value))
23+
except sr.UnknownValueError:
24+
print("Oops! Didn't catch that")
25+
except sr.RequestError as e:
26+
print("Uh oh! Couldn't request results from Google Speech Recognition service; {0}".format(e))
2827
except KeyboardInterrupt:
2928
pass

0 commit comments

Comments
 (0)