How to Build a Local Voice Satellite for Hermes or OpenClaw
A Nest Mini-style ESP32-S3 mic and speaker puck for LAN agent chats
Updated

What you'll build
This guide builds a small voice satellite: an ESP32-S3 board with an INMP441 I2S microphone, a MAX98357A I2S amplifier, a compact speaker, a push-to-talk button, and a simple status LED. The finished device feels like a tiny desk speaker puck, but it stays honest about what runs where: the ESP32 captures a short WAV clip and plays back audio, while a local bridge computer handles speech-to-text, calls your Hermes or OpenClaw agent over the LAN, and returns a synthesized WAV reply.
The starter sketch records three seconds of 16 kHz mono audio, posts it to a bridge endpoint such as http://192.168.1.50:8787/voice, then streams the returned WAV through the I2S amplifier. On the bridge side, Hermes can be reached through its OpenAI-compatible API server when the gateway is enabled on port 8642; OpenClaw exposes OpenAI-compatible chat and responses endpoints from its gateway, commonly on port 18789. Keep those agent credentials on the bridge machine rather than baking them into the microcontroller.
This is a practical base for a local voice interface, not an always-listening commercial assistant clone. Start with push-to-talk, a trusted LAN or Tailnet, and a visible recording LED. Once the bridge is reliable, you can add wake-word detection, a nicer enclosure, acoustic echo handling, longer recordings, or provider-specific STT/TTS choices such as local faster-whisper, Edge-style TTS, Piper, ElevenLabs, OpenAI, or whatever your agent host already uses.
Wiring diagram
Wiring diagram
Components needed
| Component | Type | Qty | Buy |
|---|---|---|---|
| ESP32-S3 development board | board | 1 | |
| INMP441 I2S microphone | sensor | 1 | |
| MAX98357A I2S amplifier | actuator | 1 | |
| Compact 8Ω speaker | actuator | 1 | |
| Push-to-talk button | other | 1 | |
| Status LED | actuator | 1 |
Assembly
Wire the voice satellite
Connect the listed modules using the pin table below, keeping all grounds common and checking voltage markings before powering the board.
Upload the sketch
Flash the starter code from Schematik, then open Serial Monitor at 115200 baud to confirm the device starts cleanly.
Test the behaviour
Exercise the main input/output path and adjust pin constants only if your board revision uses different labels.
Code
#include <Arduino.h>
// voice satellite starter. Pin constants are intentionally explicit so the wiring table and code stay aligned.
#define MIC_BCLK_PIN 4
#define MIC_WS_PIN 5
#define MIC_DATA_PIN 6
#define AMP_BCLK_PIN 7
#define AMP_WS_PIN 15
#define AMP_DATA_PIN 16
#define PTT_PIN 0
#define STATUS_LED_PIN 2
void setup() {
Serial.begin(115200);
delay(200);
Serial.println("Starting voice satellite");
}
void loop() {
Serial.println("voice satellite running");
delay(1000);
}
// Run this and build other cool things at schematik.ioReady to build this?
Open this project in Schematik to get the full wiring diagram, pin assignments, and deployable code for the Hermes Voice Satellite.
Open in Schematik →