This project demonstrates how to integrate Twilio's Programmable Voice API with OpenAI's real-time streaming API to enable real-time voice agents. Users can make voice calls via Twilio and the system proxies the audio with OpenAI's Realtime API.
- The
/incoming-call
endpoint responds to Twilio's incoming call webhook with the TwiML noun<Stream/>
- A Media Stream is established with the app's websocket endpoint.
- Audio packets from the voice call are forwarded to OpenAI's Realtime API.
- OpenAI responds with audio packets, which are forwarded to Twilio.
- Twilio account with a phone number
- OpenAI Platform Account and
OPENAI_API_KEY
- nGrok installed globally
git clone https://github.com/pBread/twilio-openai-voicebot-simple
cd twilio-openai-voicebot-simple
npm install
The application needs to know the domain (HOSTNAME
) it is deployed to in order to function correctly. This domain is set in the HOSTNAME
environment variable and it must be configured before starting the app.
Start ngrok by running this command.
ngrok http 3000
Then copy the domain
Note: ngrok provides static domains for all ngrok users. You can avoid updating the HOSTNAME
every time by provisioning your own static domain.
OPENAI_API_KEY=your-openai-api-key
HOSTNAME=your-ngrok-domain.ngrok.app
This command will start the Express server which handles incoming Twilio webhook requests and media streams.
npm run dev
Go to your Twilio Console and configure the Voice webhooks for your Twilio phone number:
- Incoming Call Webhook: Select
POST
and set url to:https://your-ngrok-domain.ngrok.app/incoming-call
- Call Status Update Webhook: Select
POST
and set url to:https://your-ngrok-domain.ngrok.app/call-status-update
You're all set. Place a call to your Twilio Phone Number and you should see the real-time transcript logged to your local terminal.