Skip to content

A best practice for streaming audio from a browser microphone to Dialogflow or Google Cloud STT by using websockets.

License

Notifications You must be signed in to change notification settings

dhivehi/selfservicekiosk-audio-streaming

 
 

Repository files navigation

License

Google Cloud / Dialogflow - Self Service Kiosk Demo

Open in Cloud Shell

A best practice for streaming audio from a browser microphone to Dialogflow or Google Cloud STT by using websockets.

Airport SelfService Kiosk demo, to demonstrate how microphone streaming to GCP works, from a web application.

It makes use of the following GCP resources:

  • Dialogflow & Knowledge Bases
  • Speech to Text
  • Text to Speech
  • Translate API
  • (optionally) App Engine Flex

In this demo, you can start recording your voice, it will display answers on a screen and synthesize the speech.

alt text

alt text

Live demo

A working demo can be found here: http://selfservicedesk.appspot.com/

Blog posts

I wrote very extensive blog articles on how to setup your streaming project. Want to exactly learn how this code works? Have a start here:

Blog 1: Introduction to the GCP conversational AI components, and integrating your own voice AI in a web app.
Blog 2: Building a client-side web application which streams audio from a browser microphone to a server.
Blog 3: Building a web server which receives a browser microphone stream and uses Dialogflow or the Speech to Text API for retrieving text results.
Blog 4: Getting Audio Data from Text (Text to Speech) and play it in your browser.

Slides & Video

There's a presentation and a video that accompanies the tutorial.

Slidedeck AudioStreaming

Setup Local Environment

Get a Node.js environment

  1. apt-get install nodejs -y

  2. apt-get npm

Get an Angular environment

  1. sudo npm install -g @angular/cli

Clone Repo

  1. git clone https://github.com/dialogflow/selfservicekiosk-audio-streaming.git selfservicekiosk

  2. Set the PROJECT_ID variable: export PROJECT_ID=[gcp-project-id]

  3. Set the project: gcloud config set project $PROJECT_ID

  4. Download the service account key.

  5. Assign the key to environment var: GOOGLE_APPLICATION_CREDENTIALS

LINUX/MAC export GOOGLE_APPLICATION_CREDENTIALS=/path/to/service_account.json WIN set GOOGLE_APPLICATION_CREDENTIALS=c:\path\to\service_account.json

  1. Login: gcloud auth login

  2. Open server/env.txt, change the environment variables and rename the file to server/.env

  3. Enable APIs:

 gcloud services enable \
 appengineflex.googleapis.com \
 containerregistry.googleapis.com \
 cloudbuild.googleapis.com \
 cloudtrace.googleapis.com \
 dialogflow.googleapis.com \
 logging.googleapis.com \
 monitoring.googleapis.com \
 sourcerepo.googleapis.com \
 speech.googleapis.com \
 mediatranslation.googleapis.com \
 texttospeech.googleapis.com \
 translate.googleapis.com
  1. Build the client-side Angular app:

    cd client && sudo npm install
    npm run-script build
    
  2. Start the server Typescript app, which is exposed on port 8080:

    cd ../server && sudo npm install
    npm run-script watch
    
  3. Browse to http://localhost:8080

Setup Dialogflow

  1. Create a Dialogflow agent at: http://console.dialogflow.com

  2. Zip the contents of the dialogflow folder, from this repo.

  3. Click settings > Import, and upload the Dialogflow agent zip, you just created.

  4. Caution: Knowledge connector settings are not currently included when exporting, importing, or restoring agents.

    Make sure you have enabled Beta features in settings.

    1. Select Knowledge from the left menu.
    2. Create a Knowledge Base: Airports
    3. Add the following Knowledge Base FAQs, as text/html documents:
    1. As a response it requires the following custom payload:
    {
    "knowledgebase": true,
    "QUESTION": "$Knowledge.Question[1]",
    "ANSWER": "$Knowledge.Answer[1]"
    }
    
    1. And to make the Text to Speech version of the answer working add the following Text SSML response:
    $Knowledge.Answer[1]
    

Deploy with App Engine Flex

This demo makes heavy use of websockets and the microphone getUserMedia() HTML5 API requires to run over HTTPS. Therefore, I deploy this demo with a custom runtime, so I can include my own Dockerfile.

  1. Edit the app.yaml to tweak the environment variables. Set the correct Project ID.

  2. Deploy with: gcloud app deploy

  3. Browse: gcloud app browse

Examples

The selfservice kiosk is a full end to end application. To showcase smaller examples, I've created 6 small demos. Here's how you can get these running:

  1. Install the required libraries, run the following command from the examples folder:

    npm install

  2. Start the simpleserver node app:

    npm --EXAMPLE=1 --PORT=8080 --PROJECT_ID=[your-gcp-project-id] run start

To switch to the various examples, edit the EXAMPLE variable to one of these:

  • Example 1: Dialogflow Speech Intent Detection
  • Example 2: Dialogflow Speech Detection through streaming
  • Example 3: Dialogflow Speech Intent Detection with Text to Speech output
  • Example 4: Speech to Text Transcribe Recognize Call
  • Example 5: Speech to Text Transcribe Streaming Recognize
  • Example 6: Text to Speech in a browser
  1. Browse to http://localhost:8080. Open the inspector, to preview the Dialogflow results object.

The code required for these examples can be found in simpleserver.js for the different Dialogflow & STT calls. - example1.html - example5.html will show the client-side implementations.

License

Apache 2.0

This is not an official Google product.

About

A best practice for streaming audio from a browser microphone to Dialogflow or Google Cloud STT by using websockets.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • JavaScript 69.4%
  • TypeScript 19.3%
  • HTML 6.5%
  • CSS 2.3%
  • Shell 1.9%
  • Dockerfile 0.6%