From 20009aed53d8864c9204d43a17895168a777d2cc Mon Sep 17 00:00:00 2001 From: Ilan Bigio Date: Mon, 16 Dec 2024 13:06:08 -0800 Subject: Initial commit --- README.md | 110 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 110 insertions(+) create mode 100644 README.md (limited to 'README.md') diff --git a/README.md b/README.md new file mode 100644 index 0000000..3acc522 --- /dev/null +++ b/README.md @@ -0,0 +1,110 @@ +# OpenAI Realtime API with Twilio Quickstart + +Combine OpenAI's Realtime API and Twilio's phone calling capability to build an AI calling assistant. + +Screenshot 2024-12-17 at 1 46 18 AM + +## Quick Setup + +Open three terminal windows: + +| Terminal | Purpose | Quick Reference (see below for more) | +| -------- | ----------------------------- | ------------------------------------ | +| 1 | To run the `webapp` | `npm run dev` | +| 2 | To run the `websocket-server` | `npm run dev` | +| 3 | To run `ngrok` | `ngrok http 8081` | + +Make sure all vars in `webapp/.env` and `websocket-server/.env` are set correctly. See [full setup](#full-setup) section for more. + +## Overview + +This repo implements a phone calling assistant with the Realtime API and Twilio, and had two main parts: the `webapp`, and the `websocket-server`. + +1. `webapp`: NextJS app to serve as a frontend for call configuration and transcripts +2. `websocket-server`: Express backend that handles connection from Twilio, connects it to the Realtime API, and forwards messages to the frontend + +Screenshot 2024-12-17 at 12 54 12 PM + +Twilio uses TwiML (a form of XML) to specify how to handle a phone call. When a call comes in we tell Twilio to start a bi-directional stream to our backend, where we forward messages between the call and the Realtime API. (`{{WS_URL}}` is replaced with our websocket endpoint.) + +```xml + + + + + Connected + + + + Disconnected + +``` + +We use `ngrok` to make our server reachable by Twilio. + +### Life of a phone call + +Setup + +1. We run ngrok to make our server reachable by Twilio +1. We set the Twilio webhook to our ngrok address +1. Frontend connects to the backend (`wss://[your_backend]/logs`), ready for a call + +Call + +1. Call is placed to Twilio-managed number +1. Twilio queries the webhook (`http://[your_backend]/twiml`) for TwiML instructions +1. Twilio opens a bi-directional stream to the backend (`wss://[your_backend]/call`) +1. The backend connects to the Realtime API, and starts forwarding messages: + - between Twilio and the Realtime API + - between the frontend and the Realtime API + +### Function Calling + +This demo mocks out function calls so you can provide sample responses. In reality you could handle the function call, execute some code, and then supply the response back to the model. + +## Full Setup + +1. Make sure your [auth & env](#detailed-auth--env) is configured correctly. + +2. Run webapp. + +```shell +cd webapp +npm install +npm run dev +``` + +3. Run websocket server. + +```shell +cd websocket-server +npm install +npm run dev +``` + +## Detailed Auth & Env + +### OpenAI & Twilio + +Set your credentials in `webapp/.env` and `websocket-server` - see `webapp/.env.example` and `websocket-server.env.example` for reference. + +### Ngrok + +Twilio needs to be able to reach your websocket server. If you're running it locally, your ports are inaccessible by default. [ngrok](https://ngrok.com/) can make them temporarily accessible. + +We have set the `websocket-server` to run on port `8081` by default, so that is the port we will be forwarding. + +```shell +ngrok http 8081 +``` + +Make note of the `Forwarding` URL. (e.g. `https://54c5-35-170-32-42.ngrok-free.app`) + +### Websocket URL + +Your server should now be accessible at the `Forwarding` URL when run, so set the `PUBLIC_URL` in `websocket-server/.env`. See `websocket-server/.env.example` for reference. + +# Additional Notes + +This repo isn't polished, and the security practices leave some to be desired. Please only use this as reference, and make sure to audit your app with security and engineering before deploying! -- cgit v1.2.3