How to Build an ESP32 Bluetooth Speaker

Stream phone audio to real speakers with an I2S DAC and XH-A232 amp

ESP32MusicIntermediate90 minutes7 components

Updated

How to Build an ESP32 Bluetooth Speaker
For illustrative purposes only
On this page

What you'll build

Build a proper Bluetooth speaker around an ESP32 DevKit v1. The ESP32 acts as a Bluetooth Classic A2DP audio sink, sends digital audio over I2S to a PCM5102A DAC, and the DAC feeds line-level left/right audio into an XH-A232 TPA3110 stereo amplifier. A 12 V adapter powers the amplifier directly, while an MP1584 buck converter steps the same supply down to 5 V for the ESP32. An SSD1306 OLED shows a small beat-reactive smiley and bar visualizer while music is playing.

This is not the tiny “play a tone from a buzzer” version of a speaker. The PCM5102A keeps the ESP32 out of the analog-audio path, and the XH-A232 gives enough output for real 4–8 Ω passive speakers. The important bit is the power split: 12 V goes only to the amp and buck input; the ESP32 gets 5 V on VIN from the buck; all grounds are shared so the audio reference is stable.

The firmware has four jobs: make the ESP32 discoverable as SmartSpeaker, stream Bluetooth A2DP audio out through I2S on GPIO 26/25/22, keep the PCM5102A unmuted through XSMT on GPIO 27, and update the OLED visualizer from a read-only audio callback. If your PCM5102A breakout labels the data pin as DATA instead of DIN, wire it exactly where this guide says DIN — it is the same I2S data input.

Upload and calibrate

Open the starter in Schematik and deploy it to an ESP32 DevKit v1. Open Serial Monitor at 115200 baud. A healthy boot prints a Bluetooth-ready message and the board appears as SmartSpeaker in your phone or laptop Bluetooth list within a few seconds.

Pair with SmartSpeaker, start music at low volume, then raise the volume from the phone or laptop. The XH-A232 does the speaker driving; the ESP32 and PCM5102A only handle the digital and line-level stages. The OLED should show the smiley and bars reacting to audio. If the OLED is blank but audio works, re-check the display address and the SCL move to GPIO 23.

Set the phone volume low before the first power-on. The XH-A232 can get loud quickly on efficient speakers, and a wiring mistake on the analog input is easier to catch before the amplifier is working hard.

Troubleshooting

  • SmartSpeaker does not appear in Bluetooth scans. Reset the ESP32 and watch Serial Monitor. The firmware starts Bluetooth Classic A2DP after setup; if it reboots repeatedly, check for a power dip from the buck converter or a wrong library install.
  • Compile says BluetoothA2DPSink.h is missing. Use the GitHub install URL for the ESP32-A2DP library, not the short package name. The starter declares the library from https://github.com/pschatzmann/ESP32-A2DP for that reason.
  • PCM5102A has DATA, not DIN. Wire DATA to ESP32 GPIO 22. On these DAC modules DATA and DIN are two labels for the same I2S data input.
  • Audio plays but there is high-pitched noise. Shorten the I2S and analog wires, keep the 12 V amplifier wiring away from the DAC outputs, and confirm AGND and GND join at the shared ground.
  • OLED stays blank. Confirm VCC is on 3V3, SDA is GPIO 21, SCL is GPIO 23, and the module address is 0x3C. Some boards use 0x3D.
  • ESP32 resets when music gets loud. The amplifier is pulling the 12 V supply down or injecting noise into the ground. Use a supply with more current headroom, twist the power leads, and keep the buck output wiring short.

Going further

Once the basic speaker is solid, move it from breadboard to perfboard or a small enclosure. Add a physical power switch on the 12 V input, strain relief for speaker wires, and a front panel for the OLED. If you want cleaner audio at higher volume, use shielded cable between the PCM5102A and XH-A232 and add local decoupling near the amplifier power input.

Wiring diagram

Loading diagram…
Interactive wiring diagram

Components needed

Supplier links, prices, and availability are shown as a guide and may change. Schematik may earn a commission from purchases made through affiliate links.

Assembly

1

Gather all parts

Collect the ESP32 DevKit v1, PCM5102A I2S DAC module, XH-A232 TPA3110 amplifier board, MP1584 buck converter module (pre-set to 5V output), 12V/2A barrel-jack adapter, two 4-8 Ω passive speakers (≥15W each), and a breadboard or perfboard for connections.

2

Set up the 12V power rail

Wire the 12V barrel-jack adapter output to two places: (a) the XH-A232 VCC (+) and GND (−) screw terminals, and (b) the MP1584 buck converter VIN (+) and GND (−) input pads.

3

Adjust and verify the buck converter to 5V

Connect ONLY the 12V adapter (no ESP32 yet). Power on, measure the MP1584 VOUT with a multimeter, and turn its trim potentiometer until VOUT reads 5.0V. Power off before proceeding.

4

Power the ESP32 from the buck converter

Connect the MP1584 VOUT (+5V) to the ESP32 DevKit VIN pin, and MP1584 GND to ESP32 GND.

5

Connect ESP32 to PCM5102A (I2S)

Wire the I2S bus between the ESP32 and the PCM5102A module: • GPIO26 → PCM5102A BCK • GPIO25 → PCM5102A LCK (LRCLK) • GPIO22 → PCM5102A DIN Also connect: • ESP32 3V3 → PCM5102A VCC • ESP32 GND → PCM5102A GND and AGND

6

Connect PCM5102A analog output to XH-A232 input

Connect the PCM5102A line-level outputs to the amplifier input terminals: • PCM5102A AOUTL → XH-A232 L_IN (Left input) • PCM5102A AOUTR → XH-A232 R_IN (Right input) • PCM5102A AGND → XH-A232 AGND (signal ground) Use short shielded wire or twisted pairs to minimise hum.

7

Connect the speakers

Connect each speaker to the XH-A232 screw terminals: • Left speaker + → XH-A232 L+ • Left speaker − → XH-A232 L− • Right speaker + → XH-A232 R+ • Right speaker − → XH-A232 R−

8

Final check and power-on

Double-check all connections against the wiring diagram. Plug in the 12V adapter. The ESP32 should boot and the Serial monitor will print 'Waiting for connection…'. Pair your phone or laptop with the Bluetooth device named 'SmartSpeaker' and start playing audio.

Pin assignments

PinConnectionType
5Vbuck_5v VOUTPOWER
GNDbuck_5v GNDGROUND
3V3oled VCCPOWER
GNDoled GNDGROUND
GPIO 21oled SDAI2C
GPIO 23oled SCLI2C
3V3pcm5102 VCCPOWER
GNDpcm5102 GNDGROUND
GNDpcm5102 AGNDGROUND
GPIO 26pcm5102 BCKDATA
GPIO 25pcm5102 LCKDATA
GPIO 22pcm5102 DINDATA
GPIO 27pcm5102 XSMTDIGITAL
EXTpcm5102 AOUTL XH-A232 TPA3110 Stereo Amplifier Board L_INANALOG
EXTpcm5102 AOUTR XH-A232 TPA3110 Stereo Amplifier Board R_INANALOG
EXTpcm5102 AGND XH-A232 TPA3110 Stereo Amplifier Board AGNDGROUND
EXTpsu_12v +12V MP1584 Buck Converter VINPOWER
EXTpsu_12v GND MP1584 Buck Converter GNDGROUND
EXTpsu_12v +12V XH-A232 TPA3110 Stereo Amplifier Board VCCPOWER
EXTpsu_12v GND XH-A232 TPA3110 Stereo Amplifier Board GNDGROUND
EXTxha232 L+ Left Speaker SP+ANALOG
EXTxha232 L- Left Speaker SP-ANALOG
EXTxha232 R+ Right Speaker SP+ANALOG
EXTxha232 R- Right Speaker SP-ANALOG

Code

Arduino C++
#include <Arduino.h>
#include "BluetoothA2DPSink.h"
#include <Wire.h>
#include <Adafruit_GFX.h>
#include <Adafruit_SSD1306.h>
#include <math.h>

// ── Pins ──────────────────────────────────────────────────────────────────────
#define I2S_BCK_PIN  26
#define I2S_LCK_PIN  25
#define I2S_DIN_PIN  22
#define XSMT_PIN     27
#define OLED_SDA_PIN 21
#define OLED_SCL_PIN 23

// ── OLED ──────────────────────────────────────────────────────────────────────
#define SCREEN_WIDTH  128
#define SCREEN_HEIGHT  64
#define OLED_ADDR     0x3C

// ── Visualizer ────────────────────────────────────────────────────────────────
#define NUM_BARS      16
#define BAR_MAX_H     16
#define BUF_SAMPLES   512

// ── Globals ───────────────────────────────────────────────────────────────────
void audio_data_callback(const uint8_t *data, uint32_t len);
void connection_state_changed(esp_a2d_connection_state_t state, void *ptr);
void audio_state_changed(esp_a2d_audio_state_t state, void *ptr);
void drawCuteSmiley(int cx, int cy, int r, bool excited);
void drawArms(int cx, int cy, int r, float e, uint32_t t, int move);
void updateBars();
void drawVisualizer();
void displayTask(void *param);

static int16_t           audioBuf[BUF_SAMPLES];
static volatile uint16_t audioBufLen = 0;
static SemaphoreHandle_t bufMutex;

static float    bars[NUM_BARS];
static volatile float energy     = 0.0f;
static volatile float energyFast = 0.0f;
static volatile bool  isStreaming = false;

// Beat detection for triggering dance moves
static volatile float  beatAvg   = 0.0f;
static volatile bool   beatHit   = false;

Adafruit_SSD1306  display(SCREEN_WIDTH, SCREEN_HEIGHT, &Wire, -1);
BluetoothA2DPSink a2dp_sink;

// ── Audio tap — read-only post-processing hook.
//    with_post_call=true means the library STILL writes the data to I2S itself,
//    we only get a copy. This does NOT replace/disable the I2S output path. ──────
void audio_data_callback(const uint8_t *data, uint32_t len) {
    const int16_t *src = (const int16_t *)data;
    uint32_t stereo    = len / 4;
    uint32_t n         = stereo < BUF_SAMPLES ? stereo : BUF_SAMPLES;

    float sum = 0;
    for (uint32_t i = 0; i < n; i++) {
        float s = (((int32_t)src[i*2] + src[i*2+1]) / 2.0f) / 32768.0f;
        sum += s * s;
    }
    float rms = (n > 0) ? sqrtf(sum / n) : 0.0f;
    // Perceptual (sqrt) curve + higher gain: quiet passages get boosted much
    // more than loud ones, so the animation reacts well at low volume without
    // the loud parts clipping flat at 1.0.
    float e   = fminf(sqrtf(rms * 22.0f), 1.0f);

    // Instant attack, quick decay — strong beat response
    energyFast = (e > energyFast) ? e : energyFast * 0.82f;
    energy     = energy * 0.7f + e * 0.3f;

    // Simple beat detector: flag when energy spikes above a slow running average.
    // Slower average (0.98) makes transient kicks stand out; lower threshold and
    // floor so even quiet music reliably triggers move changes.
    beatAvg = beatAvg * 0.98f + e * 0.02f;
    if (e > beatAvg * 1.22f && e > 0.06f) beatHit = true;

    if (xSemaphoreTake(bufMutex, 0) == pdTRUE) {
        for (uint32_t i = 0; i < n; i++)
            audioBuf[i] = (int16_t)(((int32_t)src[i*2] + src[i*2+1]) >> 1);
        audioBufLen = (uint16_t)n;
        xSemaphoreGive(bufMutex);
    }
}

// ── BT callbacks ──────────────────────────────────────────────────────────────
void connection_state_changed(esp_a2d_connection_state_t state, void *ptr) {
    Serial.println(state == ESP_A2D_CONNECTION_STATE_CONNECTED
        ? "[BT] Connected" : "[BT] Disconnected");
}

void audio_state_changed(esp_a2d_audio_state_t state, void *ptr) {
    if (state == ESP_A2D_AUDIO_STATE_STARTED) {
        isStreaming = true;
        digitalWrite(XSMT_PIN, HIGH);
        Serial.println("[BT] Streaming started");
    } else {
        isStreaming = false;
        digitalWrite(XSMT_PIN, LOW);
        Serial.println("[BT] Streaming stopped");
        for (int i = 0; i < NUM_BARS; i++) bars[i] = 0;
        energy = 0; energyFast = 0;
    }
}

// ── Cute happy smiley ────────────────────────────────────────────────────────
//   Big round head, large shiny eyes, rosy cheeks, real upward grin.
void drawCuteSmiley(int cx, int cy, int r, bool excited) {
    // Head — double-stroke for a bold look
    display.drawCircle(cx, cy, r,     SSD1306_WHITE);
    display.drawCircle(cx, cy, r - 1, SSD1306_WHITE);

    int eo = r / 3 + 2;          // eye spacing
    int ey = cy - r / 5;         // eyes in upper half
    int er = max(r / 4, 3);      // eye radius

    if (excited) {
        // Happy "^ ^" closed eyes on a strong beat — curve opens DOWN (cute squint)
        for (int dx = -er; dx <= er; dx++) {
            int dy = (int)(sqrtf((float)(er*er - dx*dx)) * 0.7f);
            display.drawPixel(cx - eo + dx, ey + dy, SSD1306_WHITE);
            display.drawPixel(cx - eo + dx, ey + dy - 1, SSD1306_WHITE);
            display.drawPixel(cx + eo + dx, ey + dy, SSD1306_WHITE);
            display.drawPixel(cx + eo + dx, ey + dy - 1, SSD1306_WHITE);
        }
    } else {
        // Big shiny round eyes
        display.fillCircle(cx - eo, ey, er, SSD1306_WHITE);
        display.fillCircle(cx + eo, ey, er, SSD1306_WHITE);
        display.fillCircle(cx - eo + 1, ey - 1, 1, SSD1306_BLACK); // shine
        display.fillCircle(cx + eo + 1, ey - 1, 1, SSD1306_BLACK);
    }

    // Rosy cheeks
    int ck  = r * 3 / 4;
    int cky = cy + r / 5;
    display.drawCircle(cx - ck, cky, 2, SSD1306_WHITE);
    display.drawCircle(cx + ck, cky, 2, SSD1306_WHITE);

    // ── Upward grin ── on screen +Y is DOWN, so a smile dips in the MIDDLE
    int mw    = r * 3 / 5;            // mouth half-width
    int my    = cy + r / 5;           // mouth corners baseline
    int depth = excited ? (r * 3 / 5) : (r * 2 / 5);
    for (int dx = -mw; dx <= mw; dx++) {
        float t  = (float)dx / mw;            // -1..1
        int   dy = (int)((1.0f - t * t) * depth);   // center dips DOWN = smile
        display.drawPixel(cx + dx, my + dy,     SSD1306_WHITE);
        display.drawPixel(cx + dx, my + dy + 1, SSD1306_WHITE);
    }
}

// ── Dancing arms — move = current dance move (0 wave, 1 raise-the-roof, 2 sway) ─
void drawArms(int cx, int cy, int r, float e, uint32_t t, int move) {
    int sx = r;
    int sy = cy + r / 3;
    float osc  = sinf((float)t / 220.0f);
    float beat = fminf(e * 26.0f, 22.0f);

    int lx2, ly2, rx2, ry2;
    if (move == 1) {
        // "Raise the roof" — both hands punch straight up on the beat
        int lift = (int)(8 + beat);
        lx2 = cx - sx - 4; ly2 = sy - lift;
        rx2 = cx + sx + 4; ry2 = sy - lift;
    } else if (move == 2) {
        // Side sway — both arms swing the same way, flipping with the oscillator
        int swing = (int)(osc * (10 + beat * 0.5f));
        lx2 = cx - sx - 12 + swing; ly2 = sy + 2;
        rx2 = cx + sx + 12 + swing; ry2 = sy + 2;
    } else {
        // Wave — arms alternate up/down (classic dance)
        lx2 = cx - sx - 12; ly2 = (int)(sy - beat + osc * 5.0f);
        rx2 = cx + sx + 12; ry2 = (int)(sy - beat - osc * 5.0f);
    }

    display.drawLine(cx - sx, sy, lx2, ly2, SSD1306_WHITE);
    display.fillCircle(lx2, ly2, 2, SSD1306_WHITE);
    display.drawLine(cx + sx, sy, rx2, ry2, SSD1306_WHITE);
    display.fillCircle(rx2, ry2, 2, SSD1306_WHITE);
}

// ── Bars ──────────────────────────────────────────────────────────────────────
void updateBars() {
    static int16_t snap[BUF_SAMPLES];
    uint16_t snapLen = 0;
    if (xSemaphoreTake(bufMutex, pdMS_TO_TICKS(4)) == pdTRUE) {
        snapLen = audioBufLen;
        memcpy(snap, audioBuf, snapLen * sizeof(int16_t));
        xSemaphoreGive(bufMutex);
    }
    if (snapLen == 0) return;

    int spb = max(1, (int)snapLen / NUM_BARS);
    for (int b = 0; b < NUM_BARS; b++) {
        int start = b * spb;
        int end   = min(start + spb, (int)snapLen);
        float sum = 0; int cnt = 0;
        for (int i = start; i < end; i++) {
            float s = snap[i] / 32768.0f;
            sum += s * s; cnt++;
        }
        float rms    = (cnt > 0) ? sqrtf(sum / cnt) : 0.0f;
        // Same perceptual sqrt curve as the energy meter so quiet bars still
        // rise to a visible height instead of sitting flat at the bottom.
        float target = fminf(sqrtf(rms * 22.0f), 1.0f);
        bars[b] = (target > bars[b]) ? target : bars[b] * 0.78f;
    }
}

// ── Main draw (core 0) ───────────────────────────────────────────────────────
void drawVisualizer() {
    static int      danceMove   = 0;
    static uint32_t lastSwitch  = 0;

    display.clearDisplay();
    uint32_t now = millis();

    if (!isStreaming) {
        drawCuteSmiley(64, 26, 16, false);
        display.setTextSize(1);
        display.setTextColor(SSD1306_WHITE);
        display.setCursor(16, 52);
        display.print("Waiting for BT...");
        display.display();
        return;
    }

    updateBars();

    // Cycle through the dance moves:
    //  - advance on a detected beat (min 700ms apart so a move is visible), OR
    //  - advance on a 2.5s timeout so it ALWAYS keeps alternating even when the
    //    beat detector is quiet. Each move gets at least one full bar to show.
    bool beat = beatHit;
    beatHit = false;
    if ((beat && now - lastSwitch > 700) || (now - lastSwitch > 2500)) {
        danceMove  = (danceMove + 1) % 3;
        lastSwitch = now;
    }

    float ef  = energyFast;
    bool  exc = ef > 0.35f;                  // big excited face on strong beats
    int   r   = 16 + (int)(ef * 8.0f);       // 16..24 — clearly grows
    int   bcy = 24 - (int)(ef * 6.0f);       // bounces up on beat

    drawArms(64, bcy, r, ef, now, danceMove);
    drawCuteSmiley(64, bcy, r, exc);

    // Full-width spectrum bars across the bottom
    int barW = SCREEN_WIDTH / NUM_BARS;
    for (int b = 0; b < NUM_BARS; b++) {
        int x = b * barW;
        int h = (int)(bars[b] * BAR_MAX_H);
        if (h > 0) display.fillRect(x, SCREEN_HEIGHT - h, barW - 1, h, SSD1306_WHITE);
    }

    display.display();
}

// ── Display task pinned to core 0 (keeps I2C off the BT/I2S core) ────────────
void displayTask(void *param) {
    for (;;) {
        drawVisualizer();
        vTaskDelay(pdMS_TO_TICKS(33));
    }
}

// ── Setup ─────────────────────────────────────────────────────────────────────
void setup() {
    Serial.begin(115200);
    Serial.println("SCHEMATIK SPEAKER starting...");

    pinMode(XSMT_PIN, OUTPUT);
    digitalWrite(XSMT_PIN, LOW);

    // I2C bus recovery before Wire.begin (clears a stuck SSD1306)
    pinMode(OLED_SCL_PIN, OUTPUT);
    pinMode(OLED_SDA_PIN, OUTPUT);
    digitalWrite(OLED_SDA_PIN, HIGH);
    for (int i = 0; i < 9; i++) {
        digitalWrite(OLED_SCL_PIN, HIGH); delayMicroseconds(5);
        digitalWrite(OLED_SCL_PIN, LOW);  delayMicroseconds(5);
    }
    digitalWrite(OLED_SDA_PIN, LOW);
    digitalWrite(OLED_SCL_PIN, HIGH); delayMicroseconds(5);
    digitalWrite(OLED_SDA_PIN, HIGH); delayMicroseconds(5);
    delay(50);

    Wire.begin(OLED_SDA_PIN, OLED_SCL_PIN);
    Wire.setClock(100000);
    if (!display.begin(SSD1306_SWITCHCAPVCC, OLED_ADDR)) {
        Serial.println("[OLED] Init failed — check wiring/address");
    } else {
        display.clearDisplay();
        drawCuteSmiley(64, 26, 16, false);
        display.display();
        Serial.println("[OLED] Ready");
    }

    bufMutex = xSemaphoreCreateMutex();

    // Tell the library which I2S pins to drive (legacy I2S API).
    i2s_pin_config_t pin_config = {
        .bck_io_num   = I2S_BCK_PIN,
        .ws_io_num    = I2S_LCK_PIN,
        .data_out_num = I2S_DIN_PIN,
        .data_in_num  = I2S_PIN_NO_CHANGE
    };
    a2dp_sink.set_pin_config(pin_config);

    a2dp_sink.set_on_connection_state_changed(connection_state_changed);
    a2dp_sink.set_on_audio_state_changed(audio_state_changed);

    // IMPORTANT: register the tap with post_write_callback semantics — the
    // library still installs+writes I2S itself; we only receive a copy.
    a2dp_sink.set_stream_reader(audio_data_callback, true);

    a2dp_sink.start("SCHEMATIK SPEAKER");
    Serial.println("Bluetooth ready — pair with \"SCHEMATIK SPEAKER\"");

    xTaskCreatePinnedToCore(displayTask, "display", 4096, NULL, 1, NULL, 0);
}

void loop() {
    vTaskDelay(pdMS_TO_TICKS(1000));
}

// Run this and build other cool things at schematik.io
Libraries: ESP32-A2DP, Adafruit SSD1306, Adafruit GFX Library