Skip to content

Commit

Permalink
Clean up api.ai support, remove AT&T support since AT&T is shutting i…
Browse files Browse the repository at this point in the history
…t down
  • Loading branch information
Uberi committed Apr 2, 2016
1 parent 3d2377a commit 266ad1f
Show file tree
Hide file tree
Showing 7 changed files with 105 additions and 187 deletions.
22 changes: 16 additions & 6 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,15 @@ Speech Recognition
:target: https://pypi.python.org/pypi/SpeechRecognition/
:alt: License

Library for performing speech recognition with support for `CMU Sphinx <http://cmusphinx.sourceforge.net/wiki/>`__, Google Speech Recognition, `Wit.ai <https://wit.ai/>`__, `IBM Speech to Text <http://www.ibm.com/smarterplanet/us/en/ibmwatson/developercloud/speech-to-text.html>`__, and `AT&T Speech to Text <http://developer.att.com/apis/speech>`__.
Library for performing speech recognition, with support for several engines and APIs, online and offline.

Speech recognition engine/API support:

* `CMU Sphinx <http://cmusphinx.sourceforge.net/wiki/>`__ (works offline)
* Google Speech Recognition
* `Wit.ai <https://wit.ai/>`__
* `api.ai <https://api.ai/>`__
* `IBM Speech To Text <http://www.ibm.com/smarterplanet/us/en/ibmwatson/developercloud/speech-to-text.html>`__

**Quickstart:** ``pip install SpeechRecognition``. See the "Installing" section for more details.

Expand Down Expand Up @@ -135,7 +143,7 @@ The solution is to decrease this threshold, or call ``recognizer_instance.adjust
The recognizer doesn't understand my particular language/dialect.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Try setting the recognition language to your language/dialect. To do this, see the documentation for ``recognizer_instance.recognize_sphinx``, ``recognizer_instance.recognize_google``, ``recognizer_instance.recognize_wit``, ``recognizer_instance.recognize_ibm``, and ``recognizer_instance.recognize_att``.
Try setting the recognition language to your language/dialect. To do this, see the documentation for ``recognizer_instance.recognize_sphinx``, ``recognizer_instance.recognize_google``, ``recognizer_instance.recognize_wit``, ``recognizer_instance.recognize_api``, and ``recognizer_instance.recognize_ibm``.

For example, if your language/dialect is British English, it is better to use ``"en-GB"`` as the language rather than ``"en-US"``.

Expand All @@ -144,7 +152,7 @@ The code examples throw ``UnicodeEncodeError: 'ascii' codec can't encode charact

When you're using Python 2, and your language uses non-ASCII characters, and the terminal or file-like object you're printing to only supports ASCII, an error is thrown when trying to write non-ASCII characters.

This is because in Python 2, ``recognizer_instance.recognize_sphinx``, ``recognizer_instance.recognize_google``, ``recognizer_instance.recognize_wit``, ``recognizer_instance.recognize_ibm``, and ``recognizer_instance.recognize_att`` return unicode strings (``u"something"``) rather than byte strings (``"something"``). In Python 3, all strings are unicode strings.
This is because in Python 2, ``recognizer_instance.recognize_sphinx``, ``recognizer_instance.recognize_google``, ``recognizer_instance.recognize_wit``, ``recognizer_instance.recognize_api``, and ``recognizer_instance.recognize_ibm`` return unicode strings (``u"something"``) rather than byte strings (``"something"``). In Python 3, all strings are unicode strings.

To make printing of unicode strings work in Python 2 as well, replace all print statements in your code of the following form:

Expand Down Expand Up @@ -225,18 +233,20 @@ Authors
haas85
DelightRun <[email protected]>
maverickagm
kamushadenes <[email protected]> (Kamus Hadenes)
sbraden <[email protected]> (Sarah Braden)

Please report bugs and suggestions at the `issue tracker <https://github.com/Uberi/speech_recognition/issues>`__!

How to cite this library (APA style):

Zhang, A. (2016). Speech Recognition (Version 3.2) [Software]. Available from https://github.com/Uberi/speech_recognition#readme.
Zhang, A. (2016). Speech Recognition (Version 3.3) [Software]. Available from https://github.com/Uberi/speech_recognition#readme.

How to cite this library (Chicago style):

Zhang, Anthony. 2016. *Speech Recognition* (version 3.2).
Zhang, Anthony. 2016. *Speech Recognition* (version 3.3).

Also check out the `Python Baidu Yuyin API <https://github.com/DelightRun/PyBaiduYuyin>`__, which is based on an older version of this project, and adds support for `Baidu Yuyin <http://yuyin.baidu.com/>`__.
Also check out the `Python Baidu Yuyin API <https://github.com/DelightRun/PyBaiduYuyin>`__, which is based on an older version of this project, and adds support for `Baidu Yuyin <http://yuyin.baidu.com/>`__. Note that Baidu Yuyin is only available inside China.

License
-------
Expand Down
21 changes: 11 additions & 10 deletions examples/extended_results.py
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,17 @@
except sr.RequestError as e:
print("Could not request results from Wit.ai service; {0}".format(e))

# recognize speech using api.ai
API_AI_CLIENT_ACCESS_TOKEN = "INSERT API.AI API KEY HERE" # api.ai keys are 32-character lowercase hexadecimal strings
try:
from pprint import pprint
print("api.ai recognition results:")
pprint(r.recognize_api(audio, client_access_token=API_AI_CLIENT_ACCESS_TOKEN, show_all=True)) # pretty-print the recognition result
except sr.UnknownValueError:
print("api.ai could not understand audio")
except sr.RequestError as e:
print("Could not request results from api.ai service; {0}".format(e))

# recognize speech using IBM Speech to Text
IBM_USERNAME = "INSERT IBM SPEECH TO TEXT USERNAME HERE" # IBM Speech to Text usernames are strings of the form XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
IBM_PASSWORD = "INSERT IBM SPEECH TO TEXT PASSWORD HERE" # IBM Speech to Text passwords are mixed-case alphanumeric strings
Expand All @@ -54,13 +65,3 @@
print("IBM Speech to Text could not understand audio")
except sr.RequestError as e:
print("Could not request results from IBM Speech to Text service; {0}".format(e))

# recognize speech using AT&T Speech to Text
ATT_APP_KEY = "INSERT AT&T SPEECH TO TEXT APP KEY HERE" # AT&T Speech to Text app keys are 32-character lowercase alphanumeric strings
ATT_APP_SECRET = "INSERT AT&T SPEECH TO TEXT APP SECRET HERE" # AT&T Speech to Text app secrets are 32-character lowercase alphanumeric strings
try:
print("AT&T Speech to Text thinks you said " + r.recognize_att(audio, app_key=ATT_APP_KEY, app_secret=ATT_APP_SECRET))
except sr.UnknownValueError:
print("AT&T Speech to Text could not understand audio")
except sr.RequestError as e:
print("Could not request results from AT&T Speech to Text service; {0}".format(e))
19 changes: 9 additions & 10 deletions examples/microphone_recognition.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,15 @@
except sr.RequestError as e:
print("Could not request results from Wit.ai service; {0}".format(e))

# recognize speech using api.ai
API_AI_CLIENT_ACCESS_TOKEN = "INSERT API.AI API KEY HERE" # api.ai keys are 32-character lowercase hexadecimal strings
try:
print("api.ai thinks you said " + r.recognize_api(audio, client_access_token=API_AI_CLIENT_ACCESS_TOKEN))
except sr.UnknownValueError:
print("api.ai could not understand audio")
except sr.RequestError as e:
print("Could not request results from api.ai service; {0}".format(e))

# recognize speech using IBM Speech to Text
IBM_USERNAME = "INSERT IBM SPEECH TO TEXT USERNAME HERE" # IBM Speech to Text usernames are strings of the form XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
IBM_PASSWORD = "INSERT IBM SPEECH TO TEXT PASSWORD HERE" # IBM Speech to Text passwords are mixed-case alphanumeric strings
Expand All @@ -47,13 +56,3 @@
print("IBM Speech to Text could not understand audio")
except sr.RequestError as e:
print("Could not request results from IBM Speech to Text service; {0}".format(e))

# recognize speech using AT&T Speech to Text
ATT_APP_KEY = "INSERT AT&T SPEECH TO TEXT APP KEY HERE" # AT&T Speech to Text app keys are 32-character lowercase alphanumeric strings
ATT_APP_SECRET = "INSERT AT&T SPEECH TO TEXT APP SECRET HERE" # AT&T Speech to Text app secrets are 32-character lowercase alphanumeric strings
try:
print("AT&T Speech to Text thinks you said " + r.recognize_att(audio, app_key=ATT_APP_KEY, app_secret=ATT_APP_SECRET))
except sr.UnknownValueError:
print("AT&T Speech to Text could not understand audio")
except sr.RequestError as e:
print("Could not request results from AT&T Speech to Text service; {0}".format(e))
30 changes: 9 additions & 21 deletions examples/wav_transcribe.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,15 @@
except sr.RequestError as e:
print("Could not request results from Wit.ai service; {0}".format(e))

# recognize speech using api.ai
API_AI_CLIENT_ACCESS_TOKEN = "INSERT API.AI API KEY HERE" # api.ai keys are 32-character lowercase hexadecimal strings
try:
print("api.ai thinks you said " + r.recognize_api(audio, client_access_token=API_AI_CLIENT_ACCESS_TOKEN))
except sr.UnknownValueError:
print("api.ai could not understand audio")
except sr.RequestError as e:
print("Could not request results from api.ai service; {0}".format(e))

# recognize speech using IBM Speech to Text
IBM_USERNAME = "INSERT IBM SPEECH TO TEXT USERNAME HERE" # IBM Speech to Text usernames are strings of the form XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
IBM_PASSWORD = "INSERT IBM SPEECH TO TEXT PASSWORD HERE" # IBM Speech to Text passwords are mixed-case alphanumeric strings
Expand All @@ -48,24 +57,3 @@
print("IBM Speech to Text could not understand audio")
except sr.RequestError as e:
print("Could not request results from IBM Speech to Text service; {0}".format(e))

# recognize speech using api.ai Speech to Text
# Note: Use the developer access token for managing entities and intents, and use the client access token for making queries.
API_AI_CLIENT_ACCESS_TOKEN = "INSERT API.AI SPEECH TO TEXT ACCESS TOKEN HERE" # api.ai access tokens are 32-character lowercase alphanumeric strings
API_AI_SUBSCRIPTION_KEY = "INSERT API.AI SPEECH TO TEXT SUBSCRIPTION KEY HERE" # api.ai subscription_keys are strings of the form XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
try:
print("api.ai Speech to Text thinks you said " + r.recognize_api(audio, username=API_AI_CLIENT_ACCESS_TOKEN, password=API_AI_SUBSCRIPTION_KEY))
except sr.UnknownValueError:
print("api.ai Speech to Text could not understand audio")
except sr.RequestError as e:
print("Could not request results from api.ai Speech to Text service; {0}".format(e))

# recognize speech using AT&T Speech to Text
ATT_APP_KEY = "INSERT AT&T SPEECH TO TEXT APP KEY HERE" # AT&T Speech to Text app keys are 32-character lowercase alphanumeric strings
ATT_APP_SECRET = "INSERT AT&T SPEECH TO TEXT APP SECRET HERE" # AT&T Speech to Text app secrets are 32-character lowercase alphanumeric strings
try:
print("AT&T Speech to Text thinks you said " + r.recognize_att(audio, app_key=ATT_APP_KEY, app_secret=ATT_APP_SECRET))
except sr.UnknownValueError:
print("AT&T Speech to Text could not understand audio")
except sr.RequestError as e:
print("Could not request results from AT&T Speech to Text service; {0}".format(e))
36 changes: 16 additions & 20 deletions reference/library-reference.rst
Original file line number Diff line number Diff line change
Expand Up @@ -192,45 +192,41 @@ Raises a ``speech_recognition.UnknownValueError`` exception if the speech is uni

Performs speech recognition on ``audio_data`` (an ``AudioData`` instance), using the Wit.ai API.

The Wit.ai API key is specified by ``key``. Unfortunately, these are not available without `signing up for an account <https://wit.ai/getting-started>`__ and creating an app. You will need to add at least one intent (recognizable sentence) before the API key can be accessed, though the actual intent values don't matter.
The Wit.ai API key is specified by ``key``. Unfortunately, these are not available without `signing up for an account <https://wit.ai/>`__ and creating an app. You will need to add at least one intent to the app before you can see the API key, though the actual intent settings don't matter.

To get the API key for a Wit.ai app, go to the app settings, go to the section titled "API Details", and look for "Server Access Token" or "Client Access Token". If the desired field is blank, click on the "Reset token" button on the right of the field. Wit.ai API keys are 32-character uppercase alphanumeric strings.

Though Wit.ai is designed to be used with a fixed set of phrases, it still provides services for general-purpose speech recognition.
To get the API key for a Wit.ai app, go to the app's overview page, go to the section titled "Make an API request", and look for something along the lines of ``Authorization: Bearer XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX``; ``XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX`` is the API key. Wit.ai API keys are 32-character uppercase alphanumeric strings.

The recognition language is configured in the Wit.ai app settings.

Returns the most likely transcription if ``show_all`` is false (the default). Otherwise, returns the `raw API response <https://wit.ai/docs/http/20141022#get-intent-via-text-link>`__ as a JSON dictionary.

Raises a ``speech_recognition.UnknownValueError`` exception if the speech is unintelligible. Raises a ``speech_recognition.RequestError`` exception if the key isn't valid, the quota for the key is maxed out, or there is no internet connection.

``recognizer_instance.recognize_ibm(audio_data, username, password, language = "en-US", show_all = False)``
-----------------------------------------------------------------------------------------------------------

Performs speech recognition on ``audio_data`` (an ``AudioData`` instance), using the IBM Speech to Text API.
``recognizer_instance.recognize_api(audio_data, client_access_token, show_all = False)``
------------------------------------------------------------------------

The IBM Speech to Text username and password are specified by ``username`` and ``password``, respectively. Unfortunately, these are not available without an account. IBM has published instructions for obtaining these credentials in the `IBM Watson Developer Cloud documentation <https://www.ibm.com/smarterplanet/us/en/ibmwatson/developercloud/doc/getting_started/gs-credentials.shtml>`__.
Perform speech recognition on ``audio_data`` (an ``AudioData`` instance), using the api.ai Speech to Text API.

The recognition language is determined by ``language``, an IETF language tag with a dialect like ``"en-US"`` or ``"es-ES"``, defaulting to US English. At the moment, this supports the tags ``"en-US"`` and ``"es-ES"``.
The api.ai API client access token is specified by ``client_access_token``. Unfortunately, this is not available without `signing up for an account <https://console.api.ai/api-client/#/signup>`__ and creating an agent. To get the API client access token, go to the agent settings, go to the section titled "API keys", and look for "Client access token". API client access tokens are 32-character lowercase hexadecimal strings.

Returns the most likely transcription if ``show_all`` is false (the default). Otherwise, returns the `raw API response <http://www.ibm.com/smarterplanet/us/en/ibmwatson/developercloud/speech-to-text/api/v1/#recognize>`__ as a JSON dictionary.
The recognition language is set when creating an agent in the web console.

Raises a ``speech_recognition.UnknownValueError`` exception if the speech is unintelligible. Raises a ``speech_recognition.RequestError`` exception if an error occurred, such as an invalid key, or a broken internet connection.
Returns the most likely transcription if ``show_all`` is false (the default). Otherwise, returns the `raw API response <https://api.ai/docs/reference/#a-namepost-multipost-query-multipart>`__ as a JSON dictionary.

``recognizer_instance.recognize_att(audio_data, app_key, app_secret, language = "en-US", show_all = False)``
------------------------------------------------------------------------------------------------------------
Raises a ``speech_recognition.UnknownValueError`` exception if the speech is unintelligible. Raises a ``speech_recognition.RequestError`` exception if the key isn't valid, the quota for the key is maxed out, or there is no internet connection.

Performs speech recognition on ``audio_data`` (an ``AudioData`` instance), using the AT&T Speech to Text API.
``recognizer_instance.recognize_ibm(audio_data, username, password, language = "en-US", show_all = False)``
-----------------------------------------------------------------------------------------------------------

The AT&T Speech to Text app key and app secret are specified by ``app_key`` and ``app_secret``, respectively. Unfortunately, these are not available without `signing up for an account <http://developer.att.com/apis/speech>`__ and creating an app.
Performs speech recognition on ``audio_data`` (an ``AudioData`` instance), using the IBM Speech To Text API.

To get the app key and app secret for an AT&T app, go to the `My Apps page <https://matrix.bf.sl.attcompute.com/apps>`__ and look for "APP KEY" and "APP SECRET". AT&T app keys and app secrets are 32-character lowercase alphanumeric strings.
The IBM Speech to Text username and password are specified by ``username`` and ``password``, respectively. Unfortunately, these are not available without `signing up for an account <https://console.ng.bluemix.net/registration/>`__. Once logged into the Bluemix console, follow the instructions for `creating an IBM Watson service instance <http://www.ibm.com/smarterplanet/us/en/ibmwatson/developercloud/doc/getting_started/gs-credentials.shtml>`__, where the Watson service is "Speech To Text". IBM Speech To Text usernames are strings of the form XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX, while passwords are mixed-case alphanumeric strings.

The recognition language is determined by ``language``, an IETF language tag with a dialect like ``"en-US"`` or ``"es-ES"``, defaulting to US English. At the moment, this supports the tags ``"en-US"`` and ``"es-ES"``.
The recognition language is determined by ``language``, an IETF language tag with a dialect like ``"en-US"`` or ``"es-ES"``, defaulting to US English. The supported languages are listed under the ``model`` parameter of the `audio recognition API documentation <http://www.ibm.com/smarterplanet/us/en/ibmwatson/developercloud/speech-to-text/api/v1/#recognize_audio_sessionless12>`__.

Returns the most likely transcription if ``show_all`` is false (the default). Otherwise, returns the `raw API response <https://developer.att.com/apis/speech/docs#resources-speech-to-text>`__ as a JSON dictionary.
Returns the most likely transcription if ``show_all`` is false (the default). Otherwise, returns the `raw API response <http://www.ibm.com/smarterplanet/us/en/ibmwatson/developercloud/speech-to-text/api/v1/#recognize_audio_sessionless12>`__ as a JSON dictionary.

Raises a ``speech_recognition.UnknownValueError`` exception if the speech is unintelligible. Raises a ``speech_recognition.RequestError`` exception if the key isn't valid, or there is no internet connection.
Raises a ``speech_recognition.UnknownValueError`` exception if the speech is unintelligible. Raises a ``speech_recognition.RequestError`` exception if an error occurred, such as an invalid key, or a broken internet connection.

``AudioSource``
---------------
Expand Down
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ def run(self):
description = speech_recognition.__doc__,
long_description = open("README.rst").read(),
license = speech_recognition.__license__,
keywords = "speech recognition google wit ibm att",
keywords = "speech recognition google wit api ibm",
url = "https://github.com/Uberi/speech_recognition#readme",
classifiers = [
"Development Status :: 5 - Production/Stable",
Expand Down
Loading

0 comments on commit 266ad1f

Please sign in to comment.