cross-posted from: https://lemmit.online/post/225981

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/homeassistant by /u/janostrowka on 2023-07-19 12:49:02.

Hopefully this will come in handy for our Year of the Voice.

TL;DR: Justin Alvey replaces Google Nest Mini PCB with ESP32 custom PCB which he’s open-sourcing. Shows demo of running LLM voice assistant paired with Beeper to send and receive messages.

Tweet text thread (I would also highly recommend checking out the video demos on Twitter):

I “jailbroke” a Google Nest Mini so that you can run your own LLM’s, agents and voice models. Here’s a demo using it to manage all my messages (with help from @onbeeper) 📷 on, and wait for surprise guest! I thought hard about how to best tackle this and why

After looking into jailbreaking options, I opted to completely replace the PCB. This let’s you use a cheap ($2) but powerful & developer friendly WiFi chip with a highly capable audio framework. This allows a paradigm of multiple cheap edge devices for audio & voice detection…

& offloading large models to a more powerful local device (whether your M2 Mac, PC server w/ GPU or even “tinybox”!) In most cases this device is already trusted with your credentials and data so you don’t have to hand these off to some cloud & data need never leave your home

The custom PCB uses @EspressifSystem’s ESP32-S3 I went through 2 revisions from a module to a SoC package with extra flash, simplifying to single-sided SMT (< $10 BOM) All features such as LED’s, capacitive touch, mute switch are working, & even programmable from Arduino (/IDF)

For this demo I used a custom “Maubot” with my @onbeeper credentials (a messaging app which securely bridges your messaging clients using the Matrix protocol & e2e encryption) which runs locally serving an API

I’m then using GPT3.5 (for speed) with function calling to query this

Fro the prompt I added details such as family & friends, current date, notification preferences & a list additional character voices that GPT can respond in. The response is then parsed and sent to @elevenlabsio

I’ve been experimenting with multiple of these, announcing important messages as they come in, morning briefings, noting down ideas and memos, and browsing agents. I couldn’t resist - here’s a playful (unscripted!) video of two talking to each other prompted to be AI’s from "Her

I’m working on open sourcing the PCB design, build instructions, firmware, bot & server code - expect something in the next week or so. If you don’t want to source Nest Mini’s (or shells from AliExpress) it’s still a great dev platform for developing an assistant! Stay tuned!

  • iMeddles
    link
    fedilink
    English
    arrow-up
    4
    ·
    1 year ago

    My aim for the year of voice is to replace my google minis with something that works locally with ha, if this gets integrated that way its gonna save me reasonable amounts of money on speakers :D

    • RegalPotoo@lemmy.world
      link
      fedilink
      English
      arrow-up
      5
      ·
      1 year ago

      Same, but lack of an open source Cast receiver is going to make that a hard sell for me. I hate that it’s anticompetitive proprietary bullshit, but it works, and works really well.