You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: README.rst
+40-9
Original file line number
Diff line number
Diff line change
@@ -86,18 +86,31 @@ The installation instructions are quite good as of PyAudio v0.2.9. For convenien
86
86
* On OS X, install PortAudio using `Homebrew <http://brew.sh/>`__: ``brew install portaudio``. Then, install PyAudio using `Pip <https://pip.readthedocs.org/>`__: ``pip install pyaudio``.
87
87
* On other POSIX-based systems, install the ``portaudio19-dev`` and ``python-all-dev`` (or ``python3-all-dev`` if using Python 3) packages (or their closest equivalents) using a package manager of your choice, and then install PyAudio using `Pip <https://pip.readthedocs.org/>`__: ``pip install pyaudio`` (replace ``pip`` with ``pip3`` if using Python 3).
88
88
89
-
PyAudio `wheel packages <https://pypi.python.org/pypi/wheel>`__ for 64-bit Python 2.7, 3.4, and 3.5 on Windows and Linux are included for convenience. To install, simply run ``pip install wheel`` followed by ``pip install ./third-party/WHEEL_FILENAME`` (replace ``pip`` with ``pip3`` if using Python 3) in the SpeechRecognition folder.
89
+
PyAudio `wheel packages <https://pypi.python.org/pypi/wheel>`__ for 64-bit Python 2.7, 3.4, and 3.5 on Windows and Linux are included for convenience, under the ``third-party/`` directory. To install, simply run ``pip install wheel`` followed by ``pip install ./third-party/WHEEL_FILENAME`` (replace ``pip`` with ``pip3`` if using Python 3) in the SpeechRecognition folder.
90
90
91
91
PocketSphinx-Python (for Sphinx users)
92
92
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
93
93
94
-
If you want to use the Sphinx recognizer, `PocketSphinx-Python <https://github.com/bambocher/pocketsphinx-python>`__ is required. If not installed, calling ``recognizer_instance.recognize_sphinx`` will fail.
94
+
`PocketSphinx-Python <https://github.com/bambocher/pocketsphinx-python>`__ is required if and only if you want to use the Sphinx recognizer (``recognizer_instance.recognize_sphinx``).
95
95
96
-
PocketSphinx-Python `wheel packages <https://pypi.python.org/pypi/wheel>`__ for 64-bit Python 2.7, 3.4, and 3.5 on Windows and Linux are included for convenience. To install, simply run ``pip install wheel`` followed by ``pip install ./third-party/WHEEL_FILENAME`` (replace ``pip`` with ``pip3`` if using Python 3) in the SpeechRecognition folder.
96
+
PocketSphinx-Python `wheel packages <https://pypi.python.org/pypi/wheel>`__ for 64-bit Python 2.7, 3.4, and 3.5 on Windows and Linux are included for convenience, under the ``third-party/`` directory. To install, simply run ``pip install wheel`` followed by ``pip install ./third-party/WHEEL_FILENAME`` (replace ``pip`` with ``pip3`` if using Python 3) in the SpeechRecognition folder.
97
97
98
98
Note that the versions available in most package repositories are outdated and will not work with the bundled language data. Using the bundled wheel packages or building from source is recommended.
99
99
100
-
To build PocketSphinx-Python from source:
100
+
Installing other languages
101
+
^^^^^^^^^^^^^^^^^^^^^^^^^^
102
+
103
+
By default, SpeechRecognition's Sphinx functionality supports only US English. Additional language packs are also available, but not included due to the files being too large:
104
+
105
+
* `Metropolitan French <https://db.tt/tVNcZXao>`__
106
+
* `Mandarin Chinese <https://db.tt/2YQVXmEk>`__
107
+
108
+
To install a language pack, download the ZIP archives and extract them directly into the module install directory (you can find the module install directory by running ``python -c "import speech_recognition as sr, os.path as p; print(p.dirname(sr.__file__))"``).
109
+
110
+
Once installed, you can simply specify the language using the ``language`` parameter of ``recognizer_instance.recognize_sphinx``. For example, French would be specified with ``"fr-FR"`` and Mandarin with ``"zh-CN"``.
111
+
112
+
Building PocketSphinx-Python from source
113
+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
101
114
102
115
* On Windows:
103
116
1. Install `Python <https://www.python.org/downloads/>`__, `Pip <https://pip.pypa.io/en/stable/installing/>`__, `SWIG <http://www.swig.org/download.html>`__, and `Git <https://git-scm.com/downloads>`__, preferably using a package manager.
@@ -120,10 +133,12 @@ To build PocketSphinx-Python from source:
120
133
121
134
To build an installable `wheel package <https://pypi.python.org/pypi/wheel>`__ (like the ones included with this project) instead of just installing, run ``git clone --recursive https://github.com/bambocher/pocketsphinx-python && cd pocketsphinx-python && python setup.py bdist_wheel`` instead of ``pip install pocketsphinx``/``python setup.py install``. The resulting Wheel will be found in the ``dist`` folder of the PocketSphinx-Python project directory.
122
135
123
-
Notes on the structure of the language data:
136
+
Notes on the structure of the language data
137
+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
124
138
125
139
* Every language has its own folder under ``/speech_recognition/pocketsphinx-data/LANGUAGE_NAME/``, where ``LANGUAGE_NAME`` is the IETF language tag, like ``"en-US"`` (US English) or ``"en-GB"`` (UK English).
126
140
* For example, the US English data is stored in ``/speech_recognition/pocketsphinx-data/en-US/``.
141
+
* The ``language`` parameter of ``recognizer_instance.recognize_sphinx`` simply chooses the folder with the given name.
127
142
* Languages are composed of 3 parts:
128
143
* An acoustic model ``/speech_recognition/pocketsphinx-data/LANGUAGE_NAME/acoustic-model/``, which describes how to interpret audio data.
129
144
* Acoustic models can be downloaded from the `CMU Sphinx files <http://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/>`__. These are pretty disorganized, but instructions for cleaning up specific versions are listed below.
@@ -299,6 +314,22 @@ Instances of this class are context managers, and are designed to be used with `
299
314
pass# do things here - ``source`` is the Microphone instance created above
300
315
# the microphone is automatically released at this point
301
316
317
+
``Microphone.list_microphone_names()``
318
+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
319
+
320
+
Returns a list of the names of all available microphones. For microphones where the name can't be retrieved, the list entry contains ``None`` instead.
321
+
322
+
The index of each microphone's name is the same as its device index when creating a ``Microphone`` instance - indices in this list can be used as values of ``device_index``.
323
+
324
+
To create a ``Microphone`` instance by name:
325
+
326
+
.. code:: python
327
+
328
+
m =None
329
+
for microphone_name in Microphone.list_microphone_names():
330
+
if microphone_name =="HDA Intel HDMI: 0 (hw:0,3)":
331
+
m = Microphone(i)
332
+
302
333
``WavFile(filename_or_fileobject)``
303
334
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
304
335
@@ -421,7 +452,7 @@ The ``callback`` parameter is a function that should accept two parameters - the
421
452
422
453
Performs speech recognition on ``audio_data`` (an ``AudioData`` instance), using CMU Sphinx.
423
454
424
-
The recognition language is determined by ``language``, an IETF language tag like ``"en-US"`` or ``"en-GB"``, defaulting to US English. By default, only ``en-US`` is supported. Additional languages can be installed from ;wip
455
+
The recognition language is determined by ``language``, an IETF language tag like ``"en-US"`` or ``"en-GB"``, defaulting to US English. Out of the box, only ``en-US`` is supported. See the "Installing other languages" section in the README for information about additional language packs.
425
456
426
457
Returns the most likely transcription if ``show_all`` is false (the default). Otherwise, returns the Sphinx ``pocketsphinx.pocketsphinx.Hypothesis`` object generated by Sphinx.
427
458
@@ -434,7 +465,7 @@ Performs speech recognition on ``audio_data`` (an ``AudioData`` instance), using
434
465
435
466
The Google Speech Recognition API key is specified by ``key``. If not specified, it uses a generic key that works out of the box. This should generally be used for personal or testing purposes only, as it **may be revoked by Google at any time**.
436
467
437
-
To obtain your own API key, simply follow the steps on the `API Keys <http://www.chromium.org/developers/how-tos/api-keys>`__ page at the Chromium Developers site. In the Google Developers Console, Google Speech Recognition is listed as "Speech API". Note that **the API quota is 50 requests per day**, and there is currently no way to raise this limit.
468
+
To obtain your own API key, simply follow the steps on the `API Keys <http://www.chromium.org/developers/how-tos/api-keys>`__ page at the Chromium Developers site. In the Google Developers Console, Google Speech Recognition is listed as "Speech API". Note that **the API quota for your own keys is 50 requests per day**, and there is currently no way to raise this limit.
438
469
439
470
The recognition language is determined by ``language``, an IETF language tag like ``"en-US"`` or ``"en-GB"``, defaulting to US English. A list of supported language codes can be found `here <http://stackoverflow.com/questions/14257598/>`__. Basically, language codes can be just the language (``en``), or a language with a dialect (``en-US``).
440
471
@@ -466,7 +497,7 @@ Performs speech recognition on ``audio_data`` (an ``AudioData`` instance), using
466
497
467
498
The IBM Speech to Text username and password are specified by ``username`` and ``password``, respectively. Unfortunately, these are not available without an account. IBM has published instructions for obtaining these credentials in the `IBM Watson Developer Cloud documentation <https://www.ibm.com/smarterplanet/us/en/ibmwatson/developercloud/doc/getting_started/gs-credentials.shtml>`__.
468
499
469
-
The recognition language is determined by ``language``, an IETF language tag with a dialect like ``"en-US"`` or ``"es-ES"``, defaulting to US English. At the moment, this supports the tags ``"en-US"``, ``"es-ES"``, ``"pt-BR"``, and ``"zh-CN"``.
500
+
The recognition language is determined by ``language``, an IETF language tag with a dialect like ``"en-US"`` or ``"es-ES"``, defaulting to US English. At the moment, this supports the tags ``"en-US"``and ``"es-ES"``.
470
501
471
502
Returns the most likely transcription if ``show_all`` is false (the default). Otherwise, returns the `raw API response <http://www.ibm.com/smarterplanet/us/en/ibmwatson/developercloud/speech-to-text/api/v1/#recognize>`__ as a JSON dictionary.
472
503
@@ -481,7 +512,7 @@ The AT&T Speech to Text app key and app secret are specified by ``app_key`` and
481
512
482
513
To get the app key and app secret for an AT&T app, go to the `My Apps page <https://matrix.bf.sl.attcompute.com/apps>`__ and look for "APP KEY" and "APP SECRET". AT&T app keys and app secrets are 32-character lowercase alphanumeric strings.
483
514
484
-
The recognition language is determined by ``language``, an IETF language tag with a dialect like ``"en-US"`` or ``"es-ES"``, defaulting to US English. At the moment, this supports the tags ``"en-US"``, ``"es-ES"``, and ``"ja-JP"``.
515
+
The recognition language is determined by ``language``, an IETF language tag with a dialect like ``"en-US"`` or ``"es-ES"``, defaulting to US English. At the moment, this supports the tags ``"en-US"``and ``"es-ES"``.
485
516
486
517
Returns the most likely transcription if ``show_all`` is false (the default). Otherwise, returns the `raw API response <https://developer.att.com/apis/speech/docs#resources-speech-to-text>`__ as a JSON dictionary.
Returns a list of the names of all available microphones. For microphones where the name can't be retrieved, the list entry contains ``None`` instead.
73
+
74
+
The index of each microphone's name is the same as its device index when creating a ``Microphone`` instance - indices in this list can be used as values of ``device_index``.
75
+
"""
76
+
audio=pyaudio.PyAudio()
77
+
result= []
78
+
foriinrange(audio.get_device_count()):
79
+
device_info=audio.get_device_info_by_index(i)
80
+
result.append(device_info.get("name"))
81
+
audio.terminate()
82
+
returnresult
83
+
69
84
def__enter__(self):
70
85
assertself.streamisNone, "This audio source is already inside a context manager"
Performs speech recognition on ``audio_data`` (an ``AudioData`` instance), using CMU Sphinx.
411
426
412
-
The recognition language is determined by ``language``, an IETF language tag like ``"en-US"`` or ``"en-GB"``, defaulting to US English. By default, only ``en-US`` is supported. Additional languages can be installed from ;wip
427
+
The recognition language is determined by ``language``, an IETF language tag like ``"en-US"`` or ``"en-GB"``, defaulting to US English. Out of the box, only ``en-US`` is supported. See the "Installing other languages" section in the README for information about additional language packs.
413
428
414
-
Returns the most likely transcription if ``show_all`` is false (the default). Otherwise, returns the Sphinx ``pocketsphinx.pocketsphinx.Hypothesis`` object generated by Sphinx.
429
+
Returns the most likely transcription if ``show_all`` is false (the default). Otherwise, returns the Sphinx ``pocketsphinx.pocketsphinx.Decoder`` object resulting from the recognition.
415
430
416
431
Raises a ``speech_recognition.UnknownValueError`` exception if the speech is unintelligible. Raises a ``speech_recognition.RequestError`` exception if there are any issues with the Sphinx installation.
decoder.process_raw(raw_data, False, True) # process audio data with recognition enabled (no_search = False), as a full utterance (full_utt = True)
455
-
hypothesis=decoder.hyp()
456
470
decoder.end_utt() # stop utterance processing
457
471
472
+
ifshow_all: returndecoder
473
+
458
474
# return results
459
-
ifshow_all: returnhypothesis
475
+
hypothesis=decoder.hyp()
460
476
ifhypothesisisnotNone: returnhypothesis.hypstr
461
477
raiseUnknownValueError() # no transcriptions available
462
478
@@ -556,7 +572,7 @@ def recognize_ibm(self, audio_data, username, password, language = "en-US", show
556
572
557
573
The IBM Speech to Text username and password are specified by ``username`` and ``password``, respectively. Unfortunately, these are not available without an account. IBM has published instructions for obtaining these credentials in the `IBM Watson Developer Cloud documentation <https://www.ibm.com/smarterplanet/us/en/ibmwatson/developercloud/doc/getting_started/gs-credentials.shtml>`__.
558
574
559
-
The recognition language is determined by ``language``, an IETF language tag with a dialect like ``"en-US"`` or ``"es-ES"``, defaulting to US English. At the moment, this supports the tags ``"en-US"``, ``"es-ES"``, ``"pt-BR"``, and ``"zh-CN"``.
575
+
The recognition language is determined by ``language``, an IETF language tag with a dialect like ``"en-US"`` or ``"es-ES"``, defaulting to US English. At the moment, this supports the tags ``"en-US"``and ``"es-ES"``.
560
576
561
577
Returns the most likely transcription if ``show_all`` is false (the default). Otherwise, returns the `raw API response <http://www.ibm.com/smarterplanet/us/en/ibmwatson/developercloud/speech-to-text/api/v1/#recognize>`__ as a JSON dictionary.
562
578
@@ -565,7 +581,7 @@ def recognize_ibm(self, audio_data, username, password, language = "en-US", show
565
581
assertisinstance(audio_data, AudioData), "Data must be audio data"
566
582
assertisinstance(username, str), "`username` must be a string"
567
583
assertisinstance(password, str), "`password` must be a string"
568
-
assertlanguagein ["en-US", "es-ES", "pt-BR", "zh-CN"], "`language` must be a valid language."
584
+
assertlanguagein ["en-US", "es-ES"], "`language` must be a valid language."
569
585
570
586
flac_data=audio_data.get_flac_data(
571
587
convert_rate=Noneifaudio_data.sample_rate>=16000else16000# audio samples should be at least 16 kHz
To get the app key and app secret for an AT&T app, go to the `My Apps page <https://matrix.bf.sl.attcompute.com/apps>`__ and look for "APP KEY" and "APP SECRET". AT&T app keys and app secrets are 32-character lowercase alphanumeric strings.
605
621
606
-
The recognition language is determined by ``language``, an IETF language tag with a dialect like ``"en-US"`` or ``"es-ES"``, defaulting to US English. At the moment, this supports the tags ``"en-US"``, ``"es-ES"``, and ``"ja-JP"``.
622
+
The recognition language is determined by ``language``, an IETF language tag with a dialect like ``"en-US"`` or ``"es-ES"``, defaulting to US English. At the moment, this supports the tags ``"en-US"``and ``"es-ES"``.
607
623
608
624
Returns the most likely transcription if ``show_all`` is false (the default). Otherwise, returns the `raw API response <https://developer.att.com/apis/speech/docs#resources-speech-to-text>`__ as a JSON dictionary.
0 commit comments