Speak with Chat GPT just like Amazon Alexa or Google Home

Here’s a quick video demo of using GPT-3 (text-davinci-003) with a voice interface

I have found it quite interesting to experiment with the ChatGPT model since OpenAI released it recently.

I thought it would be quite fun to connect it up to a spoken interface, just like Amazon Alexa and Google Home AI Assistants.

I decided to go with the following approach: listen for a wake word, record audio until the speaker stopped speaking, transcribe that to text, use GPT-3 to generate output text, and then use Amazon Polly to generate speech, and then “play” the resulting sound in the browser.

Fortunately, the Haley.ai platform enables composing workflows that include models and other functionality. For transcribing audio, the Whisper model was selected. To use the current OpenAI API interface, the latest GPT-3 (text-davinci-003) model was used with a prompt similar to the ChatGPT prompt (since ChatGPT is not yet released for API access). The Amazon Polly voice “Joanna” was selected, which is one of the “Neural” voices which support a limited subset of the SSML speak tags.

The models were composed together with the following workflow:

The screenshot shows the Haley Workflow editor. The 3 models are composed together with the result of the Polly model being sent back to the browser.

Speaking of Polly, the prompt used with GPT-3 shows some examples like the “prosody” tag which affects the Polly output, such as the haiku below:

haiku Screen Shot 2022-12-12 at 8.12.29 PM

Recent chat interactions are included in the prompt to give GPT-3 a degree of memory and the history of the interaction.

The Haley.ai platform takes care of messaging and running the workflow, as well as the embedded user interface displaying the chat messages.

Within the browser, we needed a wake-word to start the voice recording, and a way to track voice activity so that we can stop recording and send the audio recording to Haley.ai to process with the workflow.

Fortunately, some open-source projects do the heavy lifting for these tasks.

For wake word detection, I used: https://github.com/jaxcore/bumblebee-hotword

And to detect voice activity, including when speaking has stopped, I used: https://github.com/solyarisoftware/WeBAD

I’m hoping to make the voice detection and recording a bit more robust and then publicly release the result.

29 thoughts on “Speak with Chat GPT just like Amazon Alexa or Google Home

  1. Victor January 4, 2023 / 7:46 pm

    I look forward to more updates on this, thank you, Marc!

    Liked by 1 person

    • lullabyman April 10, 2023 / 3:24 am

      This is epic. I’m looking forward to replacing our Alexa with such a device. I’m a little amazed that there so little information out the on how to do that. Especially since so much has happened in ai dev just the last 3 months since you posted this.

      Liked by 1 person

    • Anonymous July 23, 2023 / 2:51 pm

      مرحبا

      Like

  2. Joel January 12, 2023 / 3:41 pm

    Imagine implementing this in your phone, you would have Iron Mans Jarvis in your pocket. Except he wouldn’t control my high-tech house, not only because I don’t have one. I really hope this works out, ChatGPT suggested I create a skill for Google Assistant, but I don’t think I have the required knowledge.

    Liked by 1 person

      • marchadfield January 15, 2023 / 8:55 pm

        Thanks! Siri was able to launch other apps for a while. There is a bit of an awkward hand-off between triggering Siri and the launched app, but this can work as per that demo for a “one turn” handoff and maybe even (an awkward) multi-turn if the launched app remembers some dialog state.

        Like

  3. marchadfield January 12, 2023 / 4:06 pm

    It’s possible to connect this up to Alexa or Google Assistant although those platforms can limit the speech-to-text to fixed “intents” like “Get Weather Report” — so open ended conversations are generally not possible. The technique in this article though could be used directly on phones, just not directly tied to the “Hey Siri” or “Ok Google” wake words.

    Like

    • Wally DARLING July 13, 2023 / 2:31 am

      Ojalá pudiera estar esto en Google

      Like

  4. BigMon January 16, 2023 / 1:49 am

    where can i demo this product?

    Like

    • marchadfield January 16, 2023 / 2:04 am

      If there is enough interest, we’ll release a version. Is it something you would subscribe to?

      Like

      • Anonymous January 16, 2023 / 6:43 pm

        Of course, just name the price!

        Like

  5. Anonymous January 17, 2023 / 7:38 am

    Go GO GO

    Like

  6. Josiah Coad January 19, 2023 / 3:44 am

    Please do release!

    Liked by 1 person

  7. SouthpawSteven January 26, 2023 / 9:33 pm

    This is very cool! I’m also interested!

    Like

  8. Anonymous February 13, 2023 / 6:01 pm

    Very interested as well!

    Liked by 1 person

  9. abdul March 9, 2023 / 3:06 am

    Name the price, please…

    Liked by 1 person

  10. Ryanar2009 March 20, 2023 / 5:51 am

    Very useful count me in

    Liked by 1 person

  11. Anonymous October 12, 2023 / 12:53 am

    This is never gonna happen, huh?

    Like

Leave a reply to Gwen Cancel reply