Skip to content

Commit cdb42b1

Browse files
committed
Update documentation, update tests
1 parent 081b0fc commit cdb42b1

9 files changed

+123
-17
lines changed

.gitignore

+1
Original file line numberDiff line numberDiff line change
@@ -8,3 +8,4 @@ speech_recognition/pocketsphinx-data/zh-CN/
88
fr-FR.zip
99
zh-CN.zip
1010
pocketsphinx-python/
11+
venv/

reference/library-reference.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -228,7 +228,7 @@ The Microsoft Bing Voice Recognition API key is specified by ``key``. Unfortunat
228228

229229
To get the API key, go to the `Microsoft Cognitive Services subscriptions overview <https://www.microsoft.com/cognitive-services/en-us/subscriptions>`__, go to the entry titled "Speech", and look for the key under the "Keys" column. Microsoft Bing Voice Recognition API keys are 32-character lowercase hexadecimal strings.
230230

231-
The recognition language is determined by ``language``, an RFC5646 language tag like ``"en-US"`` (US English) or ``"fr-FR"`` (International French), defaulting to US English. A list of supported language values can be found in the `API documentation <https://www.microsoft.com/cognitive-services/en-us/speech-api/documentation/api-reference-rest/BingVoiceRecognition#user-content-4-supported-locales>`__.
231+
The recognition language is determined by ``language``, an RFC5646 language tag like ``"en-US"`` (US English) or ``"fr-FR"`` (International French), defaulting to US English. A list of supported language values can be found in the `API documentation <https://www.microsoft.com/cognitive-services/en-us/speech-api/documentation/api-reference-rest/BingVoiceRecognition#SupLocales>`__.
232232

233233
Returns the most likely transcription if ``show_all`` is false (the default). Otherwise, returns the `raw API response <https://www.microsoft.com/cognitive-services/en-us/speech-api/documentation/api-reference-rest/BingVoiceRecognition#user-content-3-voice-recognition-responses>`__ as a JSON dictionary.
234234

reference/pocketsphinx.rst

+13-5
Original file line numberDiff line numberDiff line change
@@ -6,8 +6,8 @@ Installing other languages
66

77
By default, SpeechRecognition's Sphinx functionality supports only US English. Additional language packs are also available, but not included due to the files being too large:
88

9-
* `International French <https://db.tt/tVNcZXao>`__
10-
* `Mandarin Chinese <https://db.tt/2YQVXmEk>`__
9+
* `International French <https://www.dropbox.com/s/115e3mf3y21x0b8/fr-FR.zip?dl=1>`__
10+
* `Mandarin Chinese <https://www.dropbox.com/s/0iwx5ypp9uym66c/zh-CN.zip?dl=1>`__
1111

1212
To install a language pack, download the ZIP archives and extract them directly into the module install directory (you can find the module install directory by running ``python -c "import speech_recognition as sr, os.path as p; print(p.dirname(sr.__file__))"``).
1313

@@ -94,7 +94,7 @@ Notes on building the language data from source
9494
* International French: ``/speech_recognition/pocketsphinx-data/fr-FR/``:
9595
* ``/speech_recognition/pocketsphinx-data/fr-FR/language-model.lm.bin`` is ``fr-small.lm.bin`` from the `Sphinx French language model <http://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/French%20Language%20Model/>`__.
9696
* ``/speech_recognition/pocketsphinx-data/fr-FR/pronounciation-dictionary.dict`` is ``fr.dict`` from the `Sphinx French language model <http://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/French%20Language%20Model/>`__.
97-
* ``/speech_recognition/pocketsphinx-data/fr-FR/acoustic-model/`` is extracted from ``cmusphinx-fr-5.2.tar.gz`` in the `Sphinx French acoustic model <http://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/French/>`__.
97+
* ``/speech_recognition/pocketsphinx-data/fr-FR/acoustic-model/`` contains all of the files extracted from ``cmusphinx-fr-5.2.tar.gz`` in the `Sphinx French acoustic model <http://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/French/>`__.
9898
* To get better French recognition accuracy at the expense of higher disk space and RAM usage:
9999
1. Download ``fr.lm.gmp`` from the `Sphinx French language model <http://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/French%20Language%20Model/>`__.
100100
2. Convert from DMP (an obselete Sphinx binary format) to ARPA format: ``sphinx_lm_convert -i fr.lm.gmp -o french.lm.bin``.
@@ -107,5 +107,13 @@ Notes on building the language data from source
107107
4. Convert from ARPA format to Sphinx binary format: ``sphinx_lm_convert -i chinese.lm -o chinese.lm.bin``.
108108
5. Replace ``/speech_recognition/pocketsphinx-data/zh-CN/language-model.lm.bin`` with ``chinese.lm.bin`` created in the previous step.
109109
* ``/speech_recognition/pocketsphinx-data/zh-CN/pronounciation-dictionary.dict`` is ``zh_broadcastnews_utf8.dic`` from the `Sphinx Mandarin language model <http://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/Mandarin%20Language%20Model/>`__.
110-
* ``/speech_recognition/pocketsphinx-data/zh-CN/acoustic-model/`` is extracted from ``zh_broadcastnews_16k_ptm256_8000.tar.bz2`` in the `Sphinx Mandarin acoustic model <http://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/Mandarin%20Broadcast%20News%20acoustic%20models/>`__.
111-
* To get better Chinese recognition accuracy at the expense of higher disk space and RAM usage, simply skip step 3 when preparing ``zh_broadcastnews_64000_utf8.DMP``.
110+
* ``/speech_recognition/pocketsphinx-data/zh-CN/acoustic-model/`` contains all of the files extracted from ``zh_broadcastnews_16k_ptm256_8000.tar.bz2`` in the `Sphinx Mandarin acoustic model <http://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/Mandarin%20Broadcast%20News%20acoustic%20models/>`__.
111+
* To get better Chinese recognition accuracy at the expense of higher disk space and RAM usage, simply skip step 3 when preparing ``zh_broadcastnews_64000_utf8.DMP``.
112+
* Italian: ``/speech_recognition/pocketsphinx-data/it-IT/``:
113+
* ``/speech_recognition/pocketsphinx-data/it-IT/language-model.lm.bin`` is generated as follows:
114+
1. Download ``cmusphinx-it-5.2.tar.gz`` from the `Sphinx Italian language model <https://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/Italian/>`__.
115+
2. Extract ``/etc/voxforge_it_sphinx.lm`` from ``cmusphinx-it-5.2.tar.gz`` as ``italian.lm``.
116+
3. Convert from ARPA format to Sphinx binary format: ``sphinx_lm_convert -i italian.lm -o italian.lm.bin``.
117+
4. Replace ``/speech_recognition/pocketsphinx-data/it-IT/language-model.lm.bin`` with ``italian.lm.bin`` created in the previous step.
118+
* ``/speech_recognition/pocketsphinx-data/it-IT/pronounciation-dictionary.dict`` is ``/etc/voxforge_it_sphinx.dic`` from ``cmusphinx-it-5.2.tar.gz`` (from the `Sphinx Italian language model <https://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/Italian/>`__).
119+
* ``/speech_recognition/pocketsphinx-data/it-IT/acoustic-model/`` contains all of the files in ``/model_parameters`` extracted from ``cmusphinx-it-5.2.tar.gz`` (from the `Sphinx Italian language model <https://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/Italian/>`__).

speech_recognition/__init__.py

+4-4
Original file line numberDiff line numberDiff line change
@@ -195,12 +195,12 @@ def __enter__(self):
195195
# attempt to read the file as WAV
196196
self.audio_reader = wave.open(self.filename_or_fileobject, "rb")
197197
self.little_endian = True # RIFF WAV is a little-endian format (most ``audioop`` operations assume that the frames are stored in little-endian form)
198-
except wave.Error:
198+
except (wave.Error, EOFError):
199199
try:
200200
# attempt to read the file as AIFF
201201
self.audio_reader = aifc.open(self.filename_or_fileobject, "rb")
202202
self.little_endian = False # AIFF is a big-endian format
203-
except aifc.Error:
203+
except (aifc.Error, EOFError):
204204
# attempt to read the file as FLAC
205205
if hasattr(self.filename_or_fileobject, "read"):
206206
flac_data = self.filename_or_fileobject.read()
@@ -219,7 +219,7 @@ def __enter__(self):
219219
aiff_file = io.BytesIO(aiff_data)
220220
try:
221221
self.audio_reader = aifc.open(aiff_file, "rb")
222-
except aifc.Error:
222+
except (aifc.Error, EOFError):
223223
raise ValueError("Audio file could not be read as PCM WAV, AIFF/AIFF-C, or Native FLAC; check if file is corrupted or in another format")
224224
self.little_endian = False # AIFF is a big-endian format
225225
assert 1 <= self.audio_reader.getnchannels() <= 2, "Audio must be mono or stereo"
@@ -847,7 +847,7 @@ def recognize_bing(self, audio_data, key, language="en-US", show_all=False):
847847
848848
To get the API key, go to the `Microsoft Cognitive Services subscriptions overview <https://www.microsoft.com/cognitive-services/en-us/subscriptions>`__, go to the entry titled "Speech", and look for the key under the "Keys" column. Microsoft Bing Voice Recognition API keys are 32-character lowercase hexadecimal strings.
849849
850-
The recognition language is determined by ``language``, an RFC5646 language tag like ``"en-US"`` (US English) or ``"fr-FR"`` (International French), defaulting to US English. A list of supported language values can be found in the `API documentation <https://www.microsoft.com/cognitive-services/en-us/speech-api/documentation/api-reference-rest/BingVoiceRecognition#user-content-4-supported-locales>`__.
850+
The recognition language is determined by ``language``, an RFC5646 language tag like ``"en-US"`` (US English) or ``"fr-FR"`` (International French), defaulting to US English. A list of supported language values can be found in the `API documentation <https://www.microsoft.com/cognitive-services/en-us/speech-api/documentation/api-reference-rest/BingVoiceRecognition#SupLocales>`__.
851851
852852
Returns the most likely transcription if ``show_all`` is false (the default). Otherwise, returns the `raw API response <https://www.microsoft.com/cognitive-services/en-us/speech-api/documentation/api-reference-rest/BingVoiceRecognition#user-content-3-voice-recognition-responses>`__ as a JSON dictionary.
853853

tests/chinese.flac

39.1 KB
Binary file not shown.

tests/english.wav

236 KB
Binary file not shown.

tests/french.aiff

218 KB
Binary file not shown.

tests/test_audio.py

+60
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,60 @@
1+
#!/usr/bin/env python3
2+
3+
import os
4+
import wave
5+
import aifc
6+
import io
7+
import subprocess
8+
import unittest
9+
10+
import speech_recognition as sr
11+
12+
13+
class TestAudioFile(unittest.TestCase):
14+
def setUp(self):
15+
self.AUDIO_FILE_WAV = os.path.join(os.path.dirname(os.path.realpath(__file__)), "english.wav")
16+
self.AUDIO_FILE_AIFF = os.path.join(os.path.dirname(os.path.realpath(__file__)), "french.aiff")
17+
self.AUDIO_FILE_FLAC = os.path.join(os.path.dirname(os.path.realpath(__file__)), "chinese.flac")
18+
19+
def test_wav_load(self):
20+
r = sr.Recognizer()
21+
with sr.AudioFile(self.AUDIO_FILE_WAV) as source: audio = r.record(source)
22+
self.assertIsInstance(audio, sr.AudioData)
23+
audio_reader = wave.open(self.AUDIO_FILE_WAV, "rb")
24+
self.assertEqual(audio.sample_rate, audio_reader.getframerate())
25+
self.assertEqual(audio.sample_width, audio_reader.getsampwidth())
26+
self.assertEqual(audio.get_raw_data(), audio_reader.readframes(audio_reader.getnframes()))
27+
audio_reader.close()
28+
29+
30+
def test_aiff_load(self):
31+
r = sr.Recognizer()
32+
with sr.AudioFile(self.AUDIO_FILE_AIFF) as source: audio = r.record(source)
33+
self.assertIsInstance(audio, sr.AudioData)
34+
audio_reader = aifc.open(self.AUDIO_FILE_AIFF, "rb")
35+
self.assertEqual(audio.sample_rate, audio_reader.getframerate())
36+
self.assertEqual(audio.sample_width, audio_reader.getsampwidth())
37+
aiff_data = audio_reader.readframes(audio_reader.getnframes())
38+
aiff_data_little_endian = aiff_data[1::-1] + b"".join(aiff_data[i + 2:i:-1] for i in range(1, len(aiff_data), 2))
39+
self.assertEqual(audio.get_raw_data(), aiff_data_little_endian)
40+
audio_reader.close()
41+
42+
def test_flac_load(self):
43+
r = sr.Recognizer()
44+
with sr.AudioFile(self.AUDIO_FILE_FLAC) as source: audio = r.record(source)
45+
self.assertIsInstance(audio, sr.AudioData)
46+
process = subprocess.Popen([sr.get_flac_converter(), "--stdout", "--totally-silent", "--decode", "--force-aiff-format", self.AUDIO_FILE_FLAC], stdout=subprocess.PIPE)
47+
aiff_data, _ = process.communicate()
48+
aiff_file = io.BytesIO(aiff_data)
49+
audio_reader = aifc.open(aiff_file, "rb")
50+
self.assertEqual(audio.sample_rate, audio_reader.getframerate())
51+
self.assertEqual(audio.sample_width, audio_reader.getsampwidth())
52+
aiff_data = audio_reader.readframes(audio_reader.getnframes())
53+
aiff_data_little_endian = aiff_data[1::-1] + b"".join(aiff_data[i + 2:i:-1] for i in range(1, len(aiff_data), 2))
54+
self.assertEqual(audio.get_raw_data(), aiff_data_little_endian)
55+
audio_reader.close()
56+
aiff_file.close()
57+
58+
59+
if __name__ == "__main__":
60+
unittest.main()

tests/test_recognition.py

+44-7
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
#!/usr/bin/env python3
2+
# -*- coding: utf-8 -*-
23

34
import os
45
import unittest
@@ -7,41 +8,77 @@
78

89
class TestRecognition(unittest.TestCase):
910
def setUp(self):
10-
self.AUDIO_FILE_EN = os.path.join(os.path.dirname(os.path.realpath(__file__)), "..", "examples", "english.wav")
11+
self.AUDIO_FILE_EN = os.path.join(os.path.dirname(os.path.realpath(__file__)), "english.wav")
12+
self.AUDIO_FILE_FR = os.path.join(os.path.dirname(os.path.realpath(__file__)), "french.aiff")
13+
self.AUDIO_FILE_ZH = os.path.join(os.path.dirname(os.path.realpath(__file__)), "chinese.flac")
1114

12-
def test_sphinx(self):
15+
def test_sphinx_english(self):
1316
r = sr.Recognizer()
1417
with sr.AudioFile(self.AUDIO_FILE_EN) as source: audio = r.record(source)
1518
self.assertEqual(r.recognize_sphinx(audio), "wanted to three")
1619

17-
def test_google(self):
20+
def test_google_english(self):
1821
r = sr.Recognizer()
1922
with sr.AudioFile(self.AUDIO_FILE_EN) as source: audio = r.record(source)
2023
self.assertEqual(r.recognize_google(audio), "one-two-three")
2124

25+
def test_google_french(self):
26+
r = sr.Recognizer()
27+
with sr.AudioFile(self.AUDIO_FILE_FR) as source: audio = r.record(source)
28+
self.assertEqual(r.recognize_google(audio, language="fr-FR"), u"mais c'est la dictée numéro 1")
29+
30+
def test_google_chinese(self):
31+
r = sr.Recognizer()
32+
with sr.AudioFile(self.AUDIO_FILE_ZH) as source: audio = r.record(source)
33+
self.assertEqual(r.recognize_google(audio, language="zh-CN"), u"砸自己的脚")
34+
2235
@unittest.skipUnless("WIT_AI_KEY" in os.environ, "requires Wit.ai key to be specified in WIT_AI_KEY environment variable")
23-
def test_wit(self):
36+
def test_wit_english(self):
2437
r = sr.Recognizer()
2538
with sr.AudioFile(self.AUDIO_FILE_EN) as source: audio = r.record(source)
2639
self.assertEqual(r.recognize_wit(audio, key=os.environ["WIT_AI_KEY"]), "one two three")
2740

2841
@unittest.skipUnless("BING_KEY" in os.environ, "requires Microsoft Bing Voice Recognition key to be specified in BING_KEY environment variable")
29-
def test_bing(self):
42+
def test_bing_english(self):
3043
r = sr.Recognizer()
3144
with sr.AudioFile(self.AUDIO_FILE_EN) as source: audio = r.record(source)
3245
self.assertEqual(r.recognize_bing(audio, key=os.environ["BING_KEY"]), "one two three")
3346

47+
@unittest.skipUnless("BING_KEY" in os.environ, "requires Microsoft Bing Voice Recognition key to be specified in BING_KEY environment variable")
48+
def test_bing_french(self):
49+
r = sr.Recognizer()
50+
with sr.AudioFile(self.AUDIO_FILE_FR) as source: audio = r.record(source)
51+
self.assertEqual(r.recognize_bing(audio, key=os.environ["BING_KEY"], language="fr-FR"), u"et c'est la dictée numéro un")
52+
53+
@unittest.skipUnless("BING_KEY" in os.environ, "requires Microsoft Bing Voice Recognition key to be specified in BING_KEY environment variable")
54+
def test_bing_chinese(self):
55+
r = sr.Recognizer()
56+
with sr.AudioFile(self.AUDIO_FILE_ZH) as source: audio = r.record(source)
57+
self.assertEqual(r.recognize_bing(audio, key=os.environ["BING_KEY"], language="zh-CN"), u"砸自己的脚")
58+
3459
@unittest.skipUnless("HOUNDIFY_CLIENT_ID" in os.environ and "HOUNDIFY_CLIENT_KEY" in os.environ, "requires Houndify client ID and client key to be specified in HOUNDIFY_CLIENT_ID and HOUNDIFY_CLIENT_KEY environment variables")
35-
def test_houndify(self):
60+
def test_houndify_english(self):
3661
r = sr.Recognizer()
3762
with sr.AudioFile(self.AUDIO_FILE_EN) as source: audio = r.record(source)
3863
self.assertEqual(r.recognize_houndify(audio, client_id=os.environ["HOUNDIFY_CLIENT_ID"], client_key=os.environ["HOUNDIFY_CLIENT_KEY"]), "one two three")
3964

4065
@unittest.skipUnless("IBM_USERNAME" in os.environ and "IBM_PASSWORD" in os.environ, "requires IBM Speech to Text username and password to be specified in IBM_USERNAME and IBM_PASSWORD environment variables")
41-
def test_ibm(self):
66+
def test_ibm_english(self):
4267
r = sr.Recognizer()
4368
with sr.AudioFile(self.AUDIO_FILE_EN) as source: audio = r.record(source)
4469
self.assertEqual(r.recognize_ibm(audio, username=os.environ["IBM_USERNAME"], password=os.environ["IBM_PASSWORD"]), "one two three ")
4570

71+
@unittest.skipUnless("IBM_USERNAME" in os.environ and "IBM_PASSWORD" in os.environ, "requires IBM Speech to Text username and password to be specified in IBM_USERNAME and IBM_PASSWORD environment variables")
72+
def test_ibm_french(self):
73+
r = sr.Recognizer()
74+
with sr.AudioFile(self.AUDIO_FILE_FR) as source: audio = r.record(source)
75+
self.assertEqual(r.recognize_ibm(audio, username=os.environ["IBM_USERNAME"], password=os.environ["IBM_PASSWORD"], language="fr-FR"), u"si la dictée numéro un ")
76+
77+
@unittest.skipUnless("IBM_USERNAME" in os.environ and "IBM_PASSWORD" in os.environ, "requires IBM Speech to Text username and password to be specified in IBM_USERNAME and IBM_PASSWORD environment variables")
78+
def test_ibm_chinese(self):
79+
r = sr.Recognizer()
80+
with sr.AudioFile(self.AUDIO_FILE_ZH) as source: audio = r.record(source)
81+
self.assertEqual(r.recognize_ibm(audio, username=os.environ["IBM_USERNAME"], password=os.environ["IBM_PASSWORD"], language="zh-CN"), u"砸 自己 的 脚 ")
82+
4683
if __name__ == "__main__":
4784
unittest.main()

0 commit comments

Comments
 (0)