Skip to content

Commit d2de540

Browse files
committedApr 3, 2016
Add AIFF and FLAC reading support, improve audio handling, add compat shims
1 parent ea8cd08 commit d2de540

10 files changed

+194
-75
lines changed
 

‎README.rst

+2-2
Original file line numberDiff line numberDiff line change
@@ -55,8 +55,8 @@ Examples
5555
See the ``examples/`` directory for usage examples:
5656

5757
- `Recognize speech input from the microphone <https://github.com/Uberi/speech_recognition/blob/master/examples/microphone_recognition.py>`__
58-
- `Transcribe a WAV audio file <https://github.com/Uberi/speech_recognition/blob/master/examples/wav_transcribe.py>`__
59-
- `Save audio data to a WAV file <https://github.com/Uberi/speech_recognition/blob/master/examples/write_audio.py>`__
58+
- `Transcribe an audio file <https://github.com/Uberi/speech_recognition/blob/master/examples/audio_transcribe.py>`__
59+
- `Save audio data to an audio file <https://github.com/Uberi/speech_recognition/blob/master/examples/write_audio.py>`__
6060
- `Show extended recognition results <https://github.com/Uberi/speech_recognition/blob/master/examples/extended_results.py>`__
6161
- `Calibrate the recognizer energy threshold for ambient noise levels <https://github.com/Uberi/speech_recognition/blob/master/examples/calibrate_energy_threshold.py>`__ (see ``recognizer_instance.energy_threshold`` for details)
6262
- `Listening to a microphone in the background <https://github.com/Uberi/speech_recognition/blob/master/examples/background_listening.py>`__

‎examples/wav_transcribe.py ‎examples/audio_transcribe.py

+6-4
Original file line numberDiff line numberDiff line change
@@ -4,12 +4,14 @@
44

55
# obtain path to "english.wav" in the same folder as this script
66
from os import path
7-
WAV_FILE = path.join(path.dirname(path.realpath(__file__)), "english.wav")
7+
AUDIO_FILE = path.join(path.dirname(path.realpath(__file__)), "english.wav")
8+
#AUDIO_FILE = path.join(path.dirname(path.realpath(__file__)), "french.aiff")
9+
#AUDIO_FILE = path.join(path.dirname(path.realpath(__file__)), "chinese.flac")
810

9-
# use "english.wav" as the audio source
11+
# use the audio file as the audio source
1012
r = sr.Recognizer()
11-
with sr.WavFile(WAV_FILE) as source:
12-
audio = r.record(source) # read the entire WAV file
13+
with sr.AudioFile(AUDIO_FILE) as source:
14+
audio = r.record(source) # read the entire audio file
1315

1416
# recognize speech using Sphinx
1517
try:

‎examples/chinese.flac

39.1 KB
Binary file not shown.

‎examples/chinese.wav

-167 KB
Binary file not shown.

‎examples/extended_results.py

+6-4
Original file line numberDiff line numberDiff line change
@@ -4,12 +4,14 @@
44

55
# obtain path to "english.wav" in the same folder as this script
66
from os import path
7-
WAV_FILE = path.join(path.dirname(path.realpath(__file__)), "english.wav")
7+
AUDIO_FILE = path.join(path.dirname(path.realpath(__file__)), "english.wav")
8+
#AUDIO_FILE = path.join(path.dirname(path.realpath(__file__)), "french.aiff")
9+
#AUDIO_FILE = path.join(path.dirname(path.realpath(__file__)), "chinese.flac")
810

9-
# use "english.wav" as the audio source
11+
# use the audio file as the audio source
1012
r = sr.Recognizer()
11-
with sr.WavFile(WAV_FILE) as source:
12-
audio = r.record(source) # read the entire WAV file
13+
with sr.AudioFile(AUDIO_FILE) as source:
14+
audio = r.record(source) # read the entire audio file
1315

1416
# recognize speech using Sphinx
1517
try:

‎examples/french.aiff

218 KB
Binary file not shown.

‎examples/french.wav

-406 KB
Binary file not shown.

‎examples/write_audio.py

+12
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,18 @@
1010
print("Say something!")
1111
audio = r.listen(source)
1212

13+
# write audio to a RAW file
14+
with open("microphone-results.raw", "wb") as f:
15+
f.write(audio.get_raw_data())
16+
1317
# write audio to a WAV file
1418
with open("microphone-results.wav", "wb") as f:
1519
f.write(audio.get_wav_data())
20+
21+
# write audio to an AIFF file
22+
with open("microphone-results.aiff", "wb") as f:
23+
f.write(audio.get_aiff_data())
24+
25+
# write audio to a FLAC file
26+
with open("microphone-results.flac", "wb") as f:
27+
f.write(audio.get_flac_data())

‎reference/library-reference.rst

+25-10
Original file line numberDiff line numberDiff line change
@@ -42,29 +42,33 @@ To create a ``Microphone`` instance by name:
4242
if microphone_name == "HDA Intel HDMI: 0 (hw:0,3)":
4343
m = Microphone(i)
4444
45-
``WavFile(filename_or_fileobject)``
45+
``AudioFile(filename_or_fileobject)``
4646
-----------------------------------
4747

48-
Creates a new ``WavFile`` instance given a WAV audio file ``filename_or_fileobject``. Subclass of ``AudioSource``.
48+
Creates a new ``AudioFile`` instance given a WAV/AIFF/FLAC audio file `filename_or_fileobject`. Subclass of ``AudioSource``.
4949

50-
If ``filename_or_fileobject`` is a string, then it is interpreted as a path to a WAV audio file (mono or stereo) on the filesystem. Otherwise, ``filename_or_fileobject`` should be a file-like object such as ``io.BytesIO`` or similar.
50+
If ``filename_or_fileobject`` is a string, then it is interpreted as a path to an audio file on the filesystem. Otherwise, ``filename_or_fileobject`` should be a file-like object such as ``io.BytesIO`` or similar.
5151

52-
Note that using functions that read from the audio (such as ``recognizer_instance.record`` or ``recognizer_instance.listen``) will move ahead in the stream. For example, if you execute ``recognizer_instance.record(wavfile_instance, duration=10)`` twice, the first time it will return the first 10 seconds of audio, and the second time it will return the 10 seconds of audio right after that.
52+
Note that functions that read from the audio (such as ``recognizer_instance.record`` or ``recognizer_instance.listen``) will move ahead in the stream. For example, if you execute ``recognizer_instance.record(audiofile_instance, duration=10)`` twice, the first time it will return the first 10 seconds of audio, and the second time it will return the 10 seconds of audio right after that. This is always reset when entering the context with a context manager.
5353

54-
Note that the WAV file must be in PCM/LPCM format; WAVE_FORMAT_EXTENSIBLE and compressed WAV are not supported and may result in undefined behaviour.
54+
WAV files must be in PCM/LPCM format; WAVE_FORMAT_EXTENSIBLE and compressed WAV are not supported and may result in undefined behaviour.
55+
56+
Both AIFF and AIFF-C (compressed AIFF) formats are supported.
57+
58+
FLAC files must be in native FLAC format; OGG-FLAC is not supported and may result in undefined behaviour.
5559

5660
Instances of this class are context managers, and are designed to be used with ``with`` statements:
5761

5862
.. code:: python
5963
6064
import speech_recognition as sr
61-
with sr.WavFile("SOMETHING.wav") as source: # open the WAV file for reading
62-
pass # do things here - ``source`` is the WavFile instance created above
65+
with sr.AudioFile("SOME_AUDIO_FILE") as source: # open the audio file for reading
66+
pass # do things here - ``source`` is the AudioFile instance created above
6367
64-
``wavfile_instance.DURATION``
68+
``audiofile_instance.DURATION``
6569
-----------------------------
6670

67-
Represents the length of the audio stored in the WAV file in seconds. This property is only available when inside a context - essentially, that means it should only be accessed inside a ``with wavfile_instance ...`` statement. Outside of contexts, this property is ``None``.
71+
Represents the length of the audio stored in the audio file in seconds. This property is only available when inside a context - essentially, that means it should only be accessed inside the body of a ``with audiofile_instance ...`` statement. Outside of contexts, this property is ``None``.
6872

6973
This is useful when combined with the ``offset`` parameter of ``recognizer_instance.record``, since when together it is possible to perform speech recognition in chunks.
7074

@@ -248,7 +252,7 @@ Raises a ``speech_recognition.UnknownValueError`` exception if the speech is uni
248252

249253
Base class representing audio sources. Do not instantiate.
250254

251-
Instances of subclasses of this class, such as ``Microphone`` and ``WavFile``, can be passed to things like ``recognizer_instance.record`` and ``recognizer_instance.listen``.
255+
Instances of subclasses of this class, such as ``Microphone`` and ``AudioFile``, can be passed to things like ``recognizer_instance.record`` and ``recognizer_instance.listen``.
252256

253257
``AudioData``
254258
-------------
@@ -279,6 +283,17 @@ If ``convert_rate`` is specified and the audio sample rate is not ``convert_rate
279283

280284
Writing these bytes directly to a file results in a valid `WAV file <https://en.wikipedia.org/wiki/WAV>`__.
281285

286+
``audiodata_instance.get_aiff_data(convert_rate = None, convert_width = None)``
287+
-------------------------------------------------------------------------------
288+
289+
Returns a byte string representing the contents of an AIFF-C file containing the audio represented by the ``AudioData`` instance.
290+
291+
If ``convert_width`` is specified and the audio samples are not ``convert_width`` bytes each, the resulting audio is converted to match.
292+
293+
If ``convert_rate`` is specified and the audio sample rate is not ``convert_rate`` Hz, the resulting audio is resampled to match.
294+
295+
Writing these bytes directly to a file results in a valid `AIFF-C file <https://en.wikipedia.org/wiki/Audio_Interchange_File_Format>`__.
296+
282297
``audiodata_instance.get_flac_data(convert_rate = None, convert_width = None)``
283298
-------------------------------------------------------------------------------
284299

0 commit comments

Comments
 (0)
Please sign in to comment.