Here’s a quick video demo of using GPT-3 (text-davinci-003) with a voice interface
I have found it quite interesting to experiment with the ChatGPT model since OpenAI released it recently.
I thought it would be quite fun to connect it up to a spoken interface, just like Amazon Alexa and Google Home AI Assistants.
I decided to go with the following approach: listen for a wake word, record audio until the speaker stopped speaking, transcribe that to text, use GPT-3 to generate output text, and then use Amazon Polly to generate speech, and then “play” the resulting sound in the browser.
Fortunately, the Haley.ai platform enables composing workflows that include models and other functionality. For transcribing audio, the Whisper model was selected. To use the current OpenAI API interface, the latest GPT-3 (text-davinci-003) model was used with a prompt similar to the ChatGPT prompt (since ChatGPT is not yet released for API access). The Amazon Polly voice “Joanna” was selected, which is one of the “Neural” voices which support a limited subset of the SSML speak tags.
The models were composed together with the following workflow:

The screenshot shows the Haley Workflow editor. The 3 models are composed together with the result of the Polly model being sent back to the browser.
Speaking of Polly, the prompt used with GPT-3 shows some examples like the “prosody” tag which affects the Polly output, such as the haiku below:
Recent chat interactions are included in the prompt to give GPT-3 a degree of memory and the history of the interaction.
The Haley.ai platform takes care of messaging and running the workflow, as well as the embedded user interface displaying the chat messages.
Within the browser, we needed a wake-word to start the voice recording, and a way to track voice activity so that we can stop recording and send the audio recording to Haley.ai to process with the workflow.
Fortunately, some open-source projects do the heavy lifting for these tasks.
For wake word detection, I used: https://github.com/jaxcore/bumblebee-hotword
And to detect voice activity, including when speaking has stopped, I used: https://github.com/solyarisoftware/WeBAD
I’m hoping to make the voice detection and recording a bit more robust and then publicly release the result.
I look forward to more updates on this, thank you, Marc!
LikeLiked by 1 person
This is epic. I’m looking forward to replacing our Alexa with such a device. I’m a little amazed that there so little information out the on how to do that. Especially since so much has happened in ai dev just the last 3 months since you posted this.
LikeLiked by 1 person
Imagine implementing this in your phone, you would have Iron Mans Jarvis in your pocket. Except he wouldn’t control my high-tech house, not only because I don’t have one. I really hope this works out, ChatGPT suggested I create a skill for Google Assistant, but I don’t think I have the required knowledge.
LikeLiked by 1 person
You can now https://www.youtube.com/watch?v=BezLkm1bFmU
LikeLike
Thanks! Siri was able to launch other apps for a while. There is a bit of an awkward hand-off between triggering Siri and the launched app, but this can work as per that demo for a “one turn” handoff and maybe even (an awkward) multi-turn if the launched app remembers some dialog state.
LikeLike
It’s possible to connect this up to Alexa or Google Assistant although those platforms can limit the speech-to-text to fixed “intents” like “Get Weather Report” — so open ended conversations are generally not possible. The technique in this article though could be used directly on phones, just not directly tied to the “Hey Siri” or “Ok Google” wake words.
LikeLike
where can i demo this product?
LikeLike
If there is enough interest, we’ll release a version. Is it something you would subscribe to?
LikeLike
Of course, just name the price!
LikeLike
Thanks! Good to know!
LikeLike
Go GO GO
LikeLike
Please do release!
LikeLiked by 1 person
I’m interested also 🙂
LikeLiked by 1 person
That’s great! Thanks!
LikeLike
This is very cool! I’m also interested!
LikeLike
Thanks so much!
LikeLike
Very interested as well!
LikeLiked by 1 person
That’s Great!
LikeLike
Throw me in that briar patch too! I’m in!
LikeLiked by 1 person
Great!
LikeLike
Let’s go,
LikeLiked by 1 person
https://github.com/Yue-Yang/ChatGPT-Siri
LikeLiked by 1 person
Name the price, please…
LikeLiked by 1 person
Very useful count me in
LikeLiked by 1 person
love it
LikeLiked by 1 person