Skip to content

Commit

Permalink
Fix some typos, update the FAQ, and change the behaviour of listen_in…
Browse files Browse the repository at this point in the history
…_background (BACKWARDS INCOMPATIBLE - bumped major version).
  • Loading branch information
Uberi committed Jul 12, 2015
1 parent b2b1130 commit 6bbcf24
Show file tree
Hide file tree
Showing 3 changed files with 81 additions and 44 deletions.
71 changes: 49 additions & 22 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -30,13 +30,15 @@ Links:

Quickstart: ``pip install SpeechRecognition``. See the "Installing" section for more details.

To quickly try it out, run ``python -m speech_recognition`` after installing.

How to cite this library (APA style):

Zhang, A. (2015). Speech Recognition (Version 1.5) [Software]. Available from https://github.com/Uberi/speech_recognition#readme.
Zhang, A. (2015). Speech Recognition (Version 2.0) [Software]. Available from https://github.com/Uberi/speech_recognition#readme.

How to cite this library (Chicago style):

Zhang, Anthony. 2015. *Speech Recognition* (version 1.5).
Zhang, Anthony. 2015. *Speech Recognition* (version 2.0).

Also check out the [Python Baidu Yuyin API](https://github.com/DelightRun/PyBaiduYuyin), which is based on this project.

Expand Down Expand Up @@ -89,38 +91,41 @@ Transcribe a WAV audio file and show the confidence of each possibility:
except LookupError: # speech is unintelligible
print("Could not understand audio")
Listening to a microphone in the background:

.. code:: python
import speech_recognition as sr
def callback(recognizer, audio): # this is called from the background thread
try:
print("You said " + recognizer.recognize(audio)) # received audio data, now need to recognize it
except LookupError:
print("Oops! Didn't catch that")
r = sr.Recognizer()
r.listen_in_background(sr.Microphone(), callback)
import time
for _ in range(10000): time.sleep(0.1) # we're still listening even though the main thread is blocked
# when the loop stops, the program will exit and stop listening
Calibrate the recognizer energy threshold (see ``recognizer_instance.energy_threshold``) for ambient noise levels:

.. code:: python
import speech_recognition as sr
r = sr.Recognizer()
with sr.Microphone() as source: # use the default microphone as the audio source
audio = r.adjust_for_ambient_noise(source) # listen for 1 second to calibrate the energy threshold for ambient noise levels
r.adjust_for_ambient_noise(source) # listen for 1 second to calibrate the energy threshold for ambient noise levels
audio = r.listen(source) # now when we listen, the energy threshold is already set to a good value, and we can reliably catch speech right away
try:
print("You said " + r.recognize(audio)) # recognize speech using Google Speech Recognition
except LookupError: # speech is unintelligible
print("Could not understand audio")
Listening to a microphone in the background:

.. code:: python
import speech_recognition as sr
def callback(recognizer, audio): # this is called from the background thread
try:
print("You said " + recognizer.recognize(audio)) # received audio data, now need to recognize it
except LookupError:
print("Oops! Didn't catch that")
r = sr.Recognizer()
m = sr.Microphone()
with m as source: r.adjust_for_ambient_noise(source) # we only need to calibrate once, before we start listening
stop_listening = r.listen_in_background(m, callback)
import time
for _ in range(50): time.sleep(0.1) # we're still listening even though the main thread is blocked - loop runs for about 5 seconds
stop_listening() # call the stop function to stop the background thread
while True: time.sleep(0.1) # the background thread stops soon after we call the stop function
Installing
----------

Expand Down Expand Up @@ -219,6 +224,28 @@ PyInstaller doesn't know that the FLAC converters need to be bundled with the ap
3. When building the project using something like ``pyinstaller SOME_SCRIPT.py``, simply supply the ``--additional-hooks-dir`` option set to the PyInstaller hooks folder. For example, ``pyinstaller --additional-hooks-dir pyinstaller-hooks/ SOME_SCRIPT.py``.

On Ubuntu/Debian, I get errors like "jack server is not running or cannot be started" or "Cannot lock down [...] byte memory area (Cannot allocate memory)".
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The Linux audio stack is pretty fickle. There are a few things that can cause these issues.

First, make sure JACK is installed - to install it, run `sudo apt-get install multimedia-jack`

You will then want to configure the JACK daemon correctly to avoid that "Cannot allocate memory" error. Run ``sudo dpkg-reconfigure -p high jackd2`` and select "Yes" to do so.

Now, you will want to make sure your current user is in the ``audio`` group. You can add your current user to this group by running ``sudo adduser $(whoami) audio``.

Unfortunately, these changes will require you to reboot before they take effect.

After rebooting, run ``pulseaudio --kill``, followed by ``jack_control start``, to fix the "jack server is not running or cannot be started" error.

On Ubuntu/Debian, I get annoying output in the terminal saying things like "bt_audio_service_open: [...] Connection refused" and various others.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The "bt_audio_service_open" error means that you have a Bluetooth audio device, but as a physical device is not currently connected, we can't actually use it - if you're not using a Bluetooth microphone, then this can be safely ignored. If you are, and audio isn't working, then double check to make sure your microphone is actually connected. There does not seem to be a simple way to disable these messages.

For errors of the form "ALSA lib [...] Unknown PCM", see `this StackOverflow answer <http://stackoverflow.com/questions/7088672/pyaudio-working-but-spits-out-error-messages-each-time>`__. Basically, to get rid of an error of the form "Unknown PCM cards.pcm.rear", simply comment out ``pcm.rear cards.pcm.rear`` in ``/usr/share/alsa/alsa.conf``, ``~/.asoundrc``, and ``/etc/asound.conf``.

Reference
---------

Expand Down Expand Up @@ -348,14 +375,14 @@ Records a single phrase from ``source`` (an ``AudioSource`` instance) into an ``

This is done by waiting until the audio has an energy above ``recognizer_instance.energy_threshold`` (the user has started speaking), and then recording until it encounters ``recognizer_instance.pause_threshold`` seconds of silence or there is no more audio input. The ending silence is not included.

The ``timeout`` parameter is the maximum number of seconds that it will wait for a phrase to start before giving up and throwing a ``TimeoutException`` exception. If ``None``, it will wait indefinitely.
The ``timeout`` parameter is the maximum number of seconds that it will wait for a phrase to start before giving up and throwing a ``TimeoutError`` exception. If ``None``, it will wait indefinitely.

``recognizer_instance.listen_in_background(source, callback)``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Spawns a thread to repeatedly record phrases from ``source`` (an ``AudioSource`` instance) into an ``AudioData`` instance and call ``callback`` with that ``AudioData`` instance as soon as each phrase are detected.

Returns the thread (a ``threading.Thread`` instance) immediately, while the background thread continues to run in parallel. This thread is a daemon and will not stop the program from exiting if there are no other non-daemon threads.
Returns a function object that, when called, stops the background listener thread. The background thread is a daemon and will not stop the program from exiting if there are no other non-daemon threads.

Phrase recognition uses the exact same mechanism as ``recognizer_instance.listen(source)``.

Expand Down
37 changes: 15 additions & 22 deletions speech_recognition/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
"""Library for performing speech recognition with the Google Speech Recognition API."""

__author__ = "Anthony Zhang (Uberi)"
__version__ = "1.5.0"
__version__ = "2.0.0"
__license__ = "BSD"

import io, os, subprocess, wave
Expand Down Expand Up @@ -231,7 +231,6 @@ def adjust_for_ambient_noise(self, source, duration = 1):

# check if the audio input has stopped being quiet
energy = audioop.rms(buffer, source.SAMPLE_WIDTH) # energy of the audio signal
if energy > self.energy_threshold: break

# dynamically adjust the energy threshold using assymmetric weighted average
damping = self.dynamic_energy_adjustment_damping ** seconds_per_buffer # account for different chunk sizes and rates
Expand All @@ -244,7 +243,7 @@ def listen(self, source, timeout = None):
This is done by waiting until the audio has an energy above ``recognizer_instance.energy_threshold`` (the user has started speaking), and then recording until it encounters ``recognizer_instance.pause_threshold`` seconds of silence or there is no more audio input. The ending silence is not included.
The ``timeout`` parameter is the maximum number of seconds that it will wait for a phrase to start before giving up and throwing a ``TimeoutException`` exception. If ``None``, it will wait indefinitely.
The ``timeout`` parameter is the maximum number of seconds that it will wait for a phrase to start before giving up and throwing a ``TimeoutError`` exception. If ``None``, it will wait indefinitely.
"""
assert isinstance(source, AudioSource), "Source must be an audio source"

Expand Down Expand Up @@ -357,22 +356,30 @@ def listen_in_background(self, source, callback):
"""
Spawns a thread to repeatedly record phrases from ``source`` (an ``AudioSource`` instance) into an ``AudioData`` instance and call ``callback`` with that ``AudioData`` instance as soon as each phrase are detected.
Returns the thread (a ``threading.Thread`` instance) immediately, while the background thread continues to run in parallel.
Returns a function object that, when called, stops the background listener thread. The background thread is a daemon and will not stop the program from exiting if there are no other non-daemon threads.
Phrase recognition uses the exact same mechanism as ``recognizer_instance.listen(source)``.
The ``callback`` parameter is a function that should accept two parameters - the ``recognizer_instance``, and an ``AudioData`` instance representing the captured audio. Note that this function will be called from a non-main thread.
"""
assert isinstance(source, AudioSource), "Source must be an audio source"
import threading
running = [True]
def threaded_listen():
while True:
with source as s: audio = self.listen(s)
callback(self, audio)
with source as s:
while running[0]:
try: # try to detect speech for only one second to do another check if running is enabled
audio = self.listen(s, 1)
except TimeoutError:
pass
else:
if running[0]: callback(self, audio)
def stopper():
running[0] = False
listener_thread = threading.Thread(target=threaded_listen)
listener_thread.daemon = True
listener_thread.start()
return listener_thread
return stopper

def shutil_which(pgm):
"""
Expand All @@ -383,17 +390,3 @@ def shutil_which(pgm):
p = os.path.join(p, pgm)
if os.path.exists(p) and os.access(p, os.X_OK):
return p

if __name__ == "__main__":
r = Recognizer()
m = Microphone()

while True:
print("Say something!")
with m as source:
audio = r.listen(source)
print("Got it! Now to recognize it...")
try:
print("You said " + r.recognize(audio))
except LookupError:
print("Oops! Didn't catch that")
17 changes: 17 additions & 0 deletions speech_recognition/__main__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
import speech_recognition as sr

r = sr.Recognizer()
m = sr.Microphone()

print("A moment of silence, please...")
with m as source:
r.adjust_for_ambient_noise(source)
print("Set minimum energy threshold to {}".format(r.energy_threshold))
while True:
print("Say something!")
audio = r.listen(source)
print("Got it! Now to recognize it...")
try:
print("You said " + r.recognize(audio))
except LookupError:
print("Oops! Didn't catch that")

0 comments on commit 6bbcf24

Please sign in to comment.