Update documentation, update tests

Uberi · Uberi · commit cdb42b116a44 · 2017-01-03T17:35:46.000-05:00
diff --git a/.gitignore b/.gitignore
@@ -8,3 +8,4 @@ speech_recognition/pocketsphinx-data/zh-CN/
 fr-FR.zip
 zh-CN.zip
 pocketsphinx-python/
+venv/
diff --git a/reference/library-reference.rst b/reference/library-reference.rst
@@ -228,7 +228,7 @@ The Microsoft Bing Voice Recognition API key is specified by ``key``. Unfortunat
 
 To get the API key, go to the `Microsoft Cognitive Services subscriptions overview <https://www.microsoft.com/cognitive-services/en-us/subscriptions>`__, go to the entry titled "Speech", and look for the key under the "Keys" column. Microsoft Bing Voice Recognition API keys are 32-character lowercase hexadecimal strings.
 
-The recognition language is determined by ``language``, an RFC5646 language tag like ``"en-US"`` (US English) or ``"fr-FR"`` (International French), defaulting to US English. A list of supported language values can be found in the `API documentation <https://www.microsoft.com/cognitive-services/en-us/speech-api/documentation/api-reference-rest/BingVoiceRecognition#user-content-4-supported-locales>`__.
+The recognition language is determined by ``language``, an RFC5646 language tag like ``"en-US"`` (US English) or ``"fr-FR"`` (International French), defaulting to US English. A list of supported language values can be found in the `API documentation <https://www.microsoft.com/cognitive-services/en-us/speech-api/documentation/api-reference-rest/BingVoiceRecognition#SupLocales>`__.
 
 Returns the most likely transcription if ``show_all`` is false (the default). Otherwise, returns the `raw API response <https://www.microsoft.com/cognitive-services/en-us/speech-api/documentation/api-reference-rest/BingVoiceRecognition#user-content-3-voice-recognition-responses>`__ as a JSON dictionary.
 
diff --git a/reference/pocketsphinx.rst b/reference/pocketsphinx.rst
@@ -6,8 +6,8 @@ Installing other languages
 
 By default, SpeechRecognition's Sphinx functionality supports only US English. Additional language packs are also available, but not included due to the files being too large:
 
-* `International French <https://db.tt/tVNcZXao>`__
-* `Mandarin Chinese <https://db.tt/2YQVXmEk>`__
+* `International French <https://www.dropbox.com/s/115e3mf3y21x0b8/fr-FR.zip?dl=1>`__
+* `Mandarin Chinese <https://www.dropbox.com/s/0iwx5ypp9uym66c/zh-CN.zip?dl=1>`__
 
 To install a language pack, download the ZIP archives and extract them directly into the module install directory (you can find the module install directory by running ``python -c "import speech_recognition as sr, os.path as p; print(p.dirname(sr.__file__))"``).
 
@@ -94,7 +94,7 @@ Notes on building the language data from source
 * International French: ``/speech_recognition/pocketsphinx-data/fr-FR/``:
     * ``/speech_recognition/pocketsphinx-data/fr-FR/language-model.lm.bin`` is ``fr-small.lm.bin`` from the `Sphinx French language model <http://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/French%20Language%20Model/>`__.
     * ``/speech_recognition/pocketsphinx-data/fr-FR/pronounciation-dictionary.dict`` is ``fr.dict`` from the `Sphinx French language model <http://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/French%20Language%20Model/>`__.
-    * ``/speech_recognition/pocketsphinx-data/fr-FR/acoustic-model/`` is extracted from ``cmusphinx-fr-5.2.tar.gz`` in the `Sphinx French acoustic model <http://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/French/>`__.
+    * ``/speech_recognition/pocketsphinx-data/fr-FR/acoustic-model/`` contains all of the files extracted from ``cmusphinx-fr-5.2.tar.gz`` in the `Sphinx French acoustic model <http://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/French/>`__.
     * To get better French recognition accuracy at the expense of higher disk space and RAM usage:
         1. Download ``fr.lm.gmp`` from the `Sphinx French language model <http://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/French%20Language%20Model/>`__.
         2. Convert from DMP (an obselete Sphinx binary format) to ARPA format: ``sphinx_lm_convert -i fr.lm.gmp -o french.lm.bin``.
@@ -107,5 +107,13 @@ Notes on building the language data from source
         4. Convert from ARPA format to Sphinx binary format: ``sphinx_lm_convert -i chinese.lm -o chinese.lm.bin``.
         5. Replace ``/speech_recognition/pocketsphinx-data/zh-CN/language-model.lm.bin`` with ``chinese.lm.bin`` created in the previous step.
     * ``/speech_recognition/pocketsphinx-data/zh-CN/pronounciation-dictionary.dict`` is ``zh_broadcastnews_utf8.dic`` from the `Sphinx Mandarin language model <http://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/Mandarin%20Language%20Model/>`__.
-    * ``/speech_recognition/pocketsphinx-data/zh-CN/acoustic-model/`` is extracted from ``zh_broadcastnews_16k_ptm256_8000.tar.bz2`` in the `Sphinx Mandarin acoustic model <http://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/Mandarin%20Broadcast%20News%20acoustic%20models/>`__.
-    * To get better Chinese recognition accuracy at the expense of higher disk space and RAM usage, simply skip step 3 when preparing ``zh_broadcastnews_64000_utf8.DMP``.
+    * ``/speech_recognition/pocketsphinx-data/zh-CN/acoustic-model/`` contains all of the files extracted from ``zh_broadcastnews_16k_ptm256_8000.tar.bz2`` in the `Sphinx Mandarin acoustic model <http://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/Mandarin%20Broadcast%20News%20acoustic%20models/>`__.
+    * To get better Chinese recognition accuracy at the expense of higher disk space and RAM usage, simply skip step 3 when preparing ``zh_broadcastnews_64000_utf8.DMP``.
+* Italian: ``/speech_recognition/pocketsphinx-data/it-IT/``:
+    * ``/speech_recognition/pocketsphinx-data/it-IT/language-model.lm.bin`` is generated as follows:
+        1. Download ``cmusphinx-it-5.2.tar.gz`` from the `Sphinx Italian language model <https://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/Italian/>`__.
+        2. Extract ``/etc/voxforge_it_sphinx.lm`` from ``cmusphinx-it-5.2.tar.gz`` as ``italian.lm``.
+        3. Convert from ARPA format to Sphinx binary format: ``sphinx_lm_convert -i italian.lm -o italian.lm.bin``.
+        4. Replace ``/speech_recognition/pocketsphinx-data/it-IT/language-model.lm.bin`` with ``italian.lm.bin`` created in the previous step.
+    * ``/speech_recognition/pocketsphinx-data/it-IT/pronounciation-dictionary.dict`` is ``/etc/voxforge_it_sphinx.dic`` from ``cmusphinx-it-5.2.tar.gz`` (from the `Sphinx Italian language model <https://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/Italian/>`__).
+    * ``/speech_recognition/pocketsphinx-data/it-IT/acoustic-model/`` contains all of the files in ``/model_parameters`` extracted from ``cmusphinx-it-5.2.tar.gz`` (from the `Sphinx Italian language model <https://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/Italian/>`__).
diff --git a/speech_recognition/__init__.py b/speech_recognition/__init__.py
@@ -195,12 +195,12 @@ def __enter__(self):
             # attempt to read the file as WAV
             self.audio_reader = wave.open(self.filename_or_fileobject, "rb")
             self.little_endian = True  # RIFF WAV is a little-endian format (most ``audioop`` operations assume that the frames are stored in little-endian form)
-        except wave.Error:
+        except (wave.Error, EOFError):
             try:
                 # attempt to read the file as AIFF
                 self.audio_reader = aifc.open(self.filename_or_fileobject, "rb")
                 self.little_endian = False  # AIFF is a big-endian format
-            except aifc.Error:
+            except (aifc.Error, EOFError):
                 # attempt to read the file as FLAC
                 if hasattr(self.filename_or_fileobject, "read"):
                     flac_data = self.filename_or_fileobject.read()
@@ -219,7 +219,7 @@ def __enter__(self):
                 aiff_file = io.BytesIO(aiff_data)
                 try:
                     self.audio_reader = aifc.open(aiff_file, "rb")
-                except aifc.Error:
+                except (aifc.Error, EOFError):
                     raise ValueError("Audio file could not be read as PCM WAV, AIFF/AIFF-C, or Native FLAC; check if file is corrupted or in another format")
                 self.little_endian = False  # AIFF is a big-endian format
         assert 1 <= self.audio_reader.getnchannels() <= 2, "Audio must be mono or stereo"
@@ -847,7 +847,7 @@ def recognize_bing(self, audio_data, key, language="en-US", show_all=False):
 
         To get the API key, go to the `Microsoft Cognitive Services subscriptions overview <https://www.microsoft.com/cognitive-services/en-us/subscriptions>`__, go to the entry titled "Speech", and look for the key under the "Keys" column. Microsoft Bing Voice Recognition API keys are 32-character lowercase hexadecimal strings.
 
-        The recognition language is determined by ``language``, an RFC5646 language tag like ``"en-US"`` (US English) or ``"fr-FR"`` (International French), defaulting to US English. A list of supported language values can be found in the `API documentation <https://www.microsoft.com/cognitive-services/en-us/speech-api/documentation/api-reference-rest/BingVoiceRecognition#user-content-4-supported-locales>`__.
+        The recognition language is determined by ``language``, an RFC5646 language tag like ``"en-US"`` (US English) or ``"fr-FR"`` (International French), defaulting to US English. A list of supported language values can be found in the `API documentation <https://www.microsoft.com/cognitive-services/en-us/speech-api/documentation/api-reference-rest/BingVoiceRecognition#SupLocales>`__.
 
         Returns the most likely transcription if ``show_all`` is false (the default). Otherwise, returns the `raw API response <https://www.microsoft.com/cognitive-services/en-us/speech-api/documentation/api-reference-rest/BingVoiceRecognition#user-content-3-voice-recognition-responses>`__ as a JSON dictionary.
 
diff --git a/tests/chinese.flac b/tests/chinese.flac
diff --git a/tests/english.wav b/tests/english.wav
diff --git a/tests/french.aiff b/tests/french.aiff
diff --git a/tests/test_audio.py b/tests/test_audio.py
@@ -0,0 +1,60 @@
+#!/usr/bin/env python3
+
+import os
+import wave
+import aifc
+import io
+import subprocess
+import unittest
+
+import speech_recognition as sr
+
+
+class TestAudioFile(unittest.TestCase):
+    def setUp(self):
+        self.AUDIO_FILE_WAV = os.path.join(os.path.dirname(os.path.realpath(__file__)), "english.wav")
+        self.AUDIO_FILE_AIFF = os.path.join(os.path.dirname(os.path.realpath(__file__)), "french.aiff")
+        self.AUDIO_FILE_FLAC = os.path.join(os.path.dirname(os.path.realpath(__file__)), "chinese.flac")
+
+    def test_wav_load(self):
+        r = sr.Recognizer()
+        with sr.AudioFile(self.AUDIO_FILE_WAV) as source: audio = r.record(source)
+        self.assertIsInstance(audio, sr.AudioData)
+        audio_reader = wave.open(self.AUDIO_FILE_WAV, "rb")
+        self.assertEqual(audio.sample_rate, audio_reader.getframerate())
+        self.assertEqual(audio.sample_width, audio_reader.getsampwidth())
+        self.assertEqual(audio.get_raw_data(), audio_reader.readframes(audio_reader.getnframes()))
+        audio_reader.close()
+        
+
+    def test_aiff_load(self):
+        r = sr.Recognizer()
+        with sr.AudioFile(self.AUDIO_FILE_AIFF) as source: audio = r.record(source)
+        self.assertIsInstance(audio, sr.AudioData)
+        audio_reader = aifc.open(self.AUDIO_FILE_AIFF, "rb")
+        self.assertEqual(audio.sample_rate, audio_reader.getframerate())
+        self.assertEqual(audio.sample_width, audio_reader.getsampwidth())
+        aiff_data = audio_reader.readframes(audio_reader.getnframes())
+        aiff_data_little_endian = aiff_data[1::-1] + b"".join(aiff_data[i + 2:i:-1] for i in range(1, len(aiff_data), 2))
+        self.assertEqual(audio.get_raw_data(), aiff_data_little_endian)
+        audio_reader.close()
+
+    def test_flac_load(self):
+        r = sr.Recognizer()
+        with sr.AudioFile(self.AUDIO_FILE_FLAC) as source: audio = r.record(source)
+        self.assertIsInstance(audio, sr.AudioData)
+        process = subprocess.Popen([sr.get_flac_converter(), "--stdout", "--totally-silent", "--decode", "--force-aiff-format", self.AUDIO_FILE_FLAC], stdout=subprocess.PIPE)
+        aiff_data, _ = process.communicate()
+        aiff_file = io.BytesIO(aiff_data)
+        audio_reader = aifc.open(aiff_file, "rb")
+        self.assertEqual(audio.sample_rate, audio_reader.getframerate())
+        self.assertEqual(audio.sample_width, audio_reader.getsampwidth())
+        aiff_data = audio_reader.readframes(audio_reader.getnframes())
+        aiff_data_little_endian = aiff_data[1::-1] + b"".join(aiff_data[i + 2:i:-1] for i in range(1, len(aiff_data), 2))
+        self.assertEqual(audio.get_raw_data(), aiff_data_little_endian)
+        audio_reader.close()
+        aiff_file.close()
+
+
+if __name__ == "__main__":
+    unittest.main()
diff --git a/tests/test_recognition.py b/tests/test_recognition.py
@@ -1,4 +1,5 @@
 #!/usr/bin/env python3
+# -*- coding: utf-8 -*-
 
 import os
 import unittest
@@ -7,41 +8,77 @@
 
 class TestRecognition(unittest.TestCase):
     def setUp(self):
-        self.AUDIO_FILE_EN = os.path.join(os.path.dirname(os.path.realpath(__file__)), "..", "examples", "english.wav")
+        self.AUDIO_FILE_EN = os.path.join(os.path.dirname(os.path.realpath(__file__)), "english.wav")
+        self.AUDIO_FILE_FR = os.path.join(os.path.dirname(os.path.realpath(__file__)), "french.aiff")
+        self.AUDIO_FILE_ZH = os.path.join(os.path.dirname(os.path.realpath(__file__)), "chinese.flac")
 
-    def test_sphinx(self):
+    def test_sphinx_english(self):
         r = sr.Recognizer()
         with sr.AudioFile(self.AUDIO_FILE_EN) as source: audio = r.record(source)
         self.assertEqual(r.recognize_sphinx(audio), "wanted to three")
 
-    def test_google(self):
+    def test_google_english(self):
         r = sr.Recognizer()
         with sr.AudioFile(self.AUDIO_FILE_EN) as source: audio = r.record(source)
         self.assertEqual(r.recognize_google(audio), "one-two-three")
 
+    def test_google_french(self):
+        r = sr.Recognizer()
+        with sr.AudioFile(self.AUDIO_FILE_FR) as source: audio = r.record(source)
+        self.assertEqual(r.recognize_google(audio, language="fr-FR"), u"mais c'est la dictée numéro 1")
+
+    def test_google_chinese(self):
+        r = sr.Recognizer()
+        with sr.AudioFile(self.AUDIO_FILE_ZH) as source: audio = r.record(source)
+        self.assertEqual(r.recognize_google(audio, language="zh-CN"), u"砸自己的脚")
+
     @unittest.skipUnless("WIT_AI_KEY" in os.environ, "requires Wit.ai key to be specified in WIT_AI_KEY environment variable")
-    def test_wit(self):
+    def test_wit_english(self):
         r = sr.Recognizer()
         with sr.AudioFile(self.AUDIO_FILE_EN) as source: audio = r.record(source)
         self.assertEqual(r.recognize_wit(audio, key=os.environ["WIT_AI_KEY"]), "one two three")
 
     @unittest.skipUnless("BING_KEY" in os.environ, "requires Microsoft Bing Voice Recognition key to be specified in BING_KEY environment variable")
-    def test_bing(self):
+    def test_bing_english(self):
         r = sr.Recognizer()
         with sr.AudioFile(self.AUDIO_FILE_EN) as source: audio = r.record(source)
         self.assertEqual(r.recognize_bing(audio, key=os.environ["BING_KEY"]), "one two three")
 
+    @unittest.skipUnless("BING_KEY" in os.environ, "requires Microsoft Bing Voice Recognition key to be specified in BING_KEY environment variable")
+    def test_bing_french(self):
+        r = sr.Recognizer()
+        with sr.AudioFile(self.AUDIO_FILE_FR) as source: audio = r.record(source)
+        self.assertEqual(r.recognize_bing(audio, key=os.environ["BING_KEY"], language="fr-FR"), u"et c'est la dictée numéro un")
+
+    @unittest.skipUnless("BING_KEY" in os.environ, "requires Microsoft Bing Voice Recognition key to be specified in BING_KEY environment variable")
+    def test_bing_chinese(self):
+        r = sr.Recognizer()
+        with sr.AudioFile(self.AUDIO_FILE_ZH) as source: audio = r.record(source)
+        self.assertEqual(r.recognize_bing(audio, key=os.environ["BING_KEY"], language="zh-CN"), u"砸自己的脚")
+
     @unittest.skipUnless("HOUNDIFY_CLIENT_ID" in os.environ and "HOUNDIFY_CLIENT_KEY" in os.environ, "requires Houndify client ID and client key to be specified in HOUNDIFY_CLIENT_ID and HOUNDIFY_CLIENT_KEY environment variables")
-    def test_houndify(self):
+    def test_houndify_english(self):
         r = sr.Recognizer()
         with sr.AudioFile(self.AUDIO_FILE_EN) as source: audio = r.record(source)
         self.assertEqual(r.recognize_houndify(audio, client_id=os.environ["HOUNDIFY_CLIENT_ID"], client_key=os.environ["HOUNDIFY_CLIENT_KEY"]), "one two three")
 
     @unittest.skipUnless("IBM_USERNAME" in os.environ and "IBM_PASSWORD" in os.environ, "requires IBM Speech to Text username and password to be specified in IBM_USERNAME and IBM_PASSWORD environment variables")
-    def test_ibm(self):
+    def test_ibm_english(self):
         r = sr.Recognizer()
         with sr.AudioFile(self.AUDIO_FILE_EN) as source: audio = r.record(source)
         self.assertEqual(r.recognize_ibm(audio, username=os.environ["IBM_USERNAME"], password=os.environ["IBM_PASSWORD"]), "one two three ")
 
+    @unittest.skipUnless("IBM_USERNAME" in os.environ and "IBM_PASSWORD" in os.environ, "requires IBM Speech to Text username and password to be specified in IBM_USERNAME and IBM_PASSWORD environment variables")
+    def test_ibm_french(self):
+        r = sr.Recognizer()
+        with sr.AudioFile(self.AUDIO_FILE_FR) as source: audio = r.record(source)
+        self.assertEqual(r.recognize_ibm(audio, username=os.environ["IBM_USERNAME"], password=os.environ["IBM_PASSWORD"], language="fr-FR"), u"si la dictée numéro un ")
+
+    @unittest.skipUnless("IBM_USERNAME" in os.environ and "IBM_PASSWORD" in os.environ, "requires IBM Speech to Text username and password to be specified in IBM_USERNAME and IBM_PASSWORD environment variables")
+    def test_ibm_chinese(self):
+        r = sr.Recognizer()
+        with sr.AudioFile(self.AUDIO_FILE_ZH) as source: audio = r.record(source)
+        self.assertEqual(r.recognize_ibm(audio, username=os.environ["IBM_USERNAME"], password=os.environ["IBM_PASSWORD"], language="zh-CN"), u"砸 自己 的 脚 ")
+
 if __name__ == "__main__":
     unittest.main()