【ESP32S3接入讯飞在线语音识别】

视频地址:

【ESP32S3接入讯飞在线语音识别】

1. 前言

使用Seeed XIAO ESP32S3 Sense开发板接入讯飞实现在线语音识别。自带麦克风模块用做语音输入,通过串口发送字符“1”来控制数据的采集和上传。
在这里插入图片描述
语音识别对比

平台 api 教程 评分
百度 https://ai.baidu.com/tech/speech 【ESP32S3 Sense接入百度在线语音识别】 7分
讯飞 https://console.xfyun.cn/services/iat 【ESP32S3接入讯飞在线语音识别】 8分

1.1 步骤概括

(1) 在讯飞控制端选择“语音识别”并创建应用获取API Key和Secret Key获取token   
(2)采集音频数据,将数据打包成规定的格式,POST发送到请求API
(3) 接收返回的识别数据

在这里插入图片描述

1.2 硬件介绍

要学习本教程,您需要1个 ESP32S3 开发板。

目前这是我使用的ESP32S3官方硬件👍👍👍(小小的身材有大大的力量)只需要35元加摄像头麦克风79元,后期我会整理相关专栏进行Arduino系统学习😘😘😘。有需要可以购买xiao开发板💕💕💕

  1. SeeedXIAO ESP32S3 Sense硬件购买地址:https://s.click.taobao.com/lekazrt
    在这里插入图片描述

  2. ESP32-S3-CAM 核心开发板 N16R8 wifi蓝牙模块 OV2640摄像头硬件购买地址:https://s.click.taobao.com/1PTagos

在这里插入图片描述

上面两者esp32s3仍选其一,还需要麦克风INMP441全向麦克风模块 MEMS 高精度 低功耗 I2S接口 支持ESP32
【下单链接】:https://s.click.taobao.com/sNGL3as
在这里插入图片描述

1.3 接线

参考下面接线

INMP441 ESP32S3
I2S_WS GPIO17
I2S_SD GPIO3
I2S_SCK GPIO18
VCC 5V
GND GND

实物图
在这里插入图片描述

2. 操作流程

2.1 创建语音识别应用

登录讯飞账号,进入控制台,选择语音识别
  官网地址:https://www.xfyun.cn/

新用户可以直接领取资源,也可付费接入,创建应用,完善相关内容提交。
在这里插入图片描述
选择语音识别,大家根据自己的情况领取资源或者开通付费使用:https://console.xfyun.cn/services/iat
在这里插入图片描述

新手领取语音资源,注意有一年时效哦🤣🤣🤣:https://www.xfyun.cn/services/voicedictation?target=price
在这里插入图片描述

2.2 记录API秘钥

记录自己的接口信息
在这里插入图片描述
三个参数:

// 讯飞STT 的key
String STTAPPID = "99d038";
const char *STTAPISecret = "YzMxNGRkODJlNjVjMDUZTc5MjFh";
const char *STTAPIKey = "aa09fe34b2c3d8c327222463f8e74";

3. JSON语音接入api

语音听写(流式版)WebAPI 文档:https://www.xfyun.cn/doc/asr/voicedictation/API.html

语音听写流式接口,用于1分钟内的即时语音转文字技术,支持实时返回识别结果,达到一边上传音频一边获得识别文本的效果。

3.1 JSON格式

接口要求:
在这里插入图片描述

3.2 交互流程

接口数据传输与接收,握手成功后客户端和服务端会建立Websocket连接,客户端通过Websocket连接可以同时上传和接收数据。当服务端有识别结果时,会通过Websocket连接推送识别结果到客户端。发送数据时,如果间隔时间太短,可能会导致引擎识别有误。
在这里插入图片描述

3.2 ESP32S3 Sense接入代码

这里采用Platfromio框架接入,arduino IED也是可以的

在这里插入图片描述

图中对数据类型和内容说的很明确了,只需要按照这个格式打包好数据然后发送就行,下面是ESP32S3的具体实现代码。
  main.cpp

// 代码参考:https://oshwhub.com/shukkkk/esp32s3_tft_mp3
#include <Arduino.h>
#include <Adafruit_NeoPixel.h>
#include <WiFi.h>
#include "time.h"
#include "esp_sntp.h"
#include <mbedtls/md.h>
#include <base64.h>
#include "Base64_Arturo.h"
#include <ArduinoWebsockets.h>
#include <ArduinoJson.h>
#include <driver/i2s.h>
#include "SPI.h"
#include <HTTPClient.h>
#include <NTPClient.h>
#include <WiFiUdp.h>
#include <esp_system.h>
#define PIN_PIXS 48
#define PIX_NUM 1

Adafruit_NeoPixel pixels(PIX_NUM, PIN_PIXS, NEO_GRB + NEO_KHZ800);
#define I2S_WS 17
#define I2S_SD 3
#define I2S_SCK 18
#define I2S_PORT_0 I2S_NUM_0
#define SAMPLE_RATE 16000
#define RECORD_TIME_SECONDS 10
#define BUFFER_SIZE (SAMPLE_RATE * RECORD_TIME_SECONDS)

#define CHUNK_SIZE 2048
const int recordTimeSeconds = 3;//录音时间秒为单位
int16_t audioData[2560];
int16_t *pcm_data; // 录音缓存区
uint recordingSize = 0;

// char* psramBuffer = (char*)ps_malloc(512000);
String odl_answer;

String answer_list[10];
uint8_t answer_list_num = 0;
bool answer_ste = 0;

const char *ssid = "IQOO";
const char *password = "12345678";

// 讯飞STT 的key
String STTAPPID = "99d038";
const char *STTAPISecret = "YzMxNGRkODJlNjVjMDUZTc5MjFh";
const char *STTAPIKey = "aa09fe34b2c3d8c327222463f8e74";


using namespace websockets;
WebsocketsClient client;

const char *ntpServer1 = "ntp.org";
const char *ntpServer2 = "ntp.ntsc.ac.cn";
const long gmtOffset_sec = 3600;
const int daylightOffset_sec = 3600;
WiFiUDP ntpUDP;
NTPClient timeClient(ntpUDP, "pool.ntp.org");
void setup_ntp_client()
{
            
            
      
  timeClient.begin();
  // 设置时区
  // GMT +1 = 3600
  // GMT +8 = 28800
  // GMT -1 = -3600
  // GMT 0 = 0
  timeClient.setTimeOffset(+28800);
}

bool timeste = 0;
String stttext = "";
bool sttste = 0;

String unixTimeToGMTString(time_t unixTime)
{
            
            
      
  char buffer[80];
  struct tm timeinfo;
  gmtime_r(&unixTime, &timeinfo);
  strftime(buffer, sizeof(buffer), "%a, %d %b %Y %H:%M:%S GMT", &timeinfo);
  return String(buffer);
}
String getDateTime()
{
            
            
      
  // 请求网络时间
  timeClient.update();

  unsigned long epochTime = timeClient.getEpochTime();
  Serial.print("Epoch Time: ");
  Serial.println(epochTime);

  String timeString = unixTimeToGMTString(epochTime);

  // 打印结果
  Serial.println(timeString);
  return timeString;
}

void i2s_install()
{
            
            
      
  const i2s_config_t i2s_config = {
            
            
      
      .mode = (i2s_mode_t)(I2S_MODE_MASTER | I2S_MODE_RX),
      .sample_rate = SAMPLE_RATE,
      .bits_per_sample = i2s_bits_per_sample_t(16),
      .channel_format = I2S_CHANNEL_FMT_ONLY_LEFT,
      .communication_format = i2s_comm_format_t(I2S_COMM_FORMAT_STAND_I2S),
      .intr_alloc_flags = 0, // default interrupt priority
      .dma_buf_count = 8,
      .dma_buf_len = 1024,
      .use_apll = false};

  esp_err_t err = i2s_driver_install(I2S_PORT_0, &i2s_config, 0, NULL);
  if (err != ESP_OK)
  {
            
            
      
    Serial.printf("I2S driver install failed (I2S_PORT_0): %d\n", err);
    while (true)
      ;
  }

 
}

void i2s_setpin()
{
            
            
      
  const i2s_pin_config_t pin_config = {
            
            
      
      .bck_io_num = I2S_SCK,
      .ws_io_num = I2S_WS,
      .data_out_num = I2S_PIN_NO_CHANGE,
      .data_in_num = I2S_SD};

  esp_err_t err = i2s_set_pin(I2S_PORT_0, &pin_config);
  if (err != ESP_OK)
  {
            
            
      
    Serial.printf("I2S set pin failed (I2S_PORT_0): %d\n", err);
    while (true)
      ;
  }

  
}

// 处理url格式
String formatDateForURL(String dateString)
{
            
            
      
  // 替换空格为 "+"
  dateString.replace(" ", "+");
  dateString.replace(",", "%2C");
  dateString.replace(":", "%3A");
  return dateString;
}

// 构造讯飞ws连接url
String XF_wsUrl(const char *Secret, const char *Key, String request, String host)
{
            
            
      
  String timeString = getDateTime();
  String signature_origin = "host: " + host;
  signature_origin += "\n";
  signature_origin += "date: ";
  signature_origin += timeString;
  signature_origin += "\n";
  signature_origin += "GET " + request + " HTTP/1.1";

  // 使用 mbedtls 计算 HMAC-SHA256
  unsigned char hmacResult[32]; // SHA256 产生的哈希结果长度为 32 字节
  mbedtls_md_context_t ctx;
  mbedtls_md_type_t md_type = MBEDTLS_MD_SHA256;
  mbedtls_md_init(&ctx);
  mbedtls_md_setup(&ctx, mbedtls_md_info_from_type(md_type), 1); // 1 表示 HMAC
  mbedtls_md_hmac_starts(&ctx, (const unsigned char *)Secret, strlen(Secret));
  mbedtls_md_hmac_update(&ctx, (const unsigned char *)signature_origin.c_str(), signature_origin.length());
  mbedtls_md_hmac_finish(&ctx, hmacResult);
  mbedtls_md_free(&ctx);
  // 对结果进行 Base64 编码
  String base64Result = base64::encode(hmacResult, 32);

  String authorization_origin = "api_key=\"";
  authorization_origin += Key;
  authorization_origin += "\", algorithm=\"hmac-sha256\", headers=\"host date request-line\", signature=\"";
  authorization_origin += base64Result;
  authorization_origin += "\"";
  String authorization = base64::encode(authorization_origin);

  String url = "ws://" + host + request;
  url += "?authorization=";
  url += authorization;
  url += "&date=";
  url += formatDateForURL(timeString);
  url += "&host=" + host;
  return url;
}

// 向讯飞STT发送音频数据
void STTsend()
{
            
            
      
  uint8_t status = 0;
  int dataSize = 1280 * 8;
  int audioDataSize = recordingSize * 2;
  uint lan = (audioDataSize) / dataSize;
  uint lan_end = (audioDataSize) % dataSize;
  if (lan_end > 0)
  {
            
            
      
    lan++;
  }

  // Serial.printf("byteDatasize: %d , lan: %d , lan_end: %d \n", audioDataSize, lan, lan_end);
  String host_url = XF_wsUrl(STTAPISecret, STTAPIKey, "/v2/iat", "ws-api.xfyun.cn");
  Serial.println("Connecting to server.");
  bool connected = client.connect(host_url);
  if (connected)
  {
            
            
      
    Serial.println("Connected!");
  }
  else
  {
            
            
      
    Serial.println("Not Connected!");
  }
  // 分段向STT发送PCM音频数据
  for (int i = 0; i < lan; i++)
  {
            
            
      

    if (i == (lan - 1))
    {
            
            
      
      status = 2;
    }
    if (status == 0)
    {
            
            
      
      String input = "{";
      input += "\"common\":{ \"app_id\":\"" + STTAPPID + "\" },";
      input += "\"business\":{\"domain\": \"iat\", \"language\": \"zh_cn\", \"accent\": \"mandarin\", \"vinfo\":1,\"vad_eos\":10000},";
      input += "\"data\":{\"status\": 0, \"format\": \"audio/L16;rate=16000\",\"encoding\": \"raw\",\"audio\":\"";
      String base64audioString = base64::encode((uint8_t *)pcm_data, dataSize);
      input += base64audioString;
      input += "\"}}";
      Serial.printf("input: %d , status: %d \n", i, status);
      client.send(input);
      status = 1;
    }
    else if (status == 1)
    {
            
            
      
      String input = "{";
      input += "\"data\":{\"status\": 1, \"format\": \"audio/L16;rate=16000\",\"encoding\": \"raw\",\"audio\":\"";
      String base64audioString = base64::encode((uint8_t *)pcm_data + (i * dataSize), dataSize);
      input += base64audioString;
      input += "\"}}";
      // Serial.printf("input: %d , status: %d \n", i, status);
      client.send(input);
    }
    else if (status == 2)
    {
            
            
      
      if (lan_end == 0)
      {
            
            
      
        String input = "{";
        input += "\"data\":{\"status\": 2, \"format\": \"audio/L16;rate=16000\",\"encoding\": \"raw\",\"audio\":\"";
        String base64audioString = base64::encode((uint8_t *)pcm_data + (i * dataSize), dataSize);
        input += base64audioString;
        input += "\"}}";
        Serial.printf("input: %d , status: %d \n", i, status);
        client.send(input);
      }
      if (lan_end > 0)
      {
            
            
      
        String input = "{";
        input += "\"data\":{\"status\": 2, \"format\": \"audio/L16;rate=16000\",\"encoding\": \"raw\",\"audio\":\"";
        String base64audioString = base64::encode((uint8_t *)pcm_data + (i * dataSize), lan_end);
        input += base64audioString;
        input += "\"}}";
        Serial.printf("input: %d , status: %d \n", i, status);
        client.send(input);
      }
    }
    delay(30);
  }
}


void setup()
{
            
            
      

  Serial.begin(115200);
  pinMode(48, OUTPUT);
  digitalWrite(48, HIGH);
  Serial.printf("Connecting to %s ", ssid);
  WiFi.begin(ssid, password);
  while (WiFi.status() != WL_CONNECTED)
  {
            
            
      
    delay(500);
    Serial.print(".");
  }
  Serial.println(" CONNECTED");

  setup_ntp_client();
  getDateTime();

  pixels.begin();
  pixels.setBrightness(8);
  // 熄灭2812
  pixels.clear();
  pixels.show();

  Serial.println("Setup I2S ...");
  i2s_install();
  i2s_setpin();
  esp_err_t err = i2s_start(I2S_PORT_0);
  if (err != ESP_OK)
  {
            
            
      
    Serial.printf("I2S start failed (I2S_PORT_0): %d\n", err);
    while (true)
      ;
  }

  // run callback when messages are received
  client.onMessage([&](WebsocketsMessage message) {
            
            
       // STT ws连接的回调函数
    Serial.print("Got Message: ");
    Serial.println(message.data());
    JsonDocument doc;
    DeserializationError error = deserializeJson(doc, message.data());
    if (error)
    {
            
            
      
      Serial.print(F("deserializeJson() failed: "));
      Serial.println(error.f_str());
      return;
    }
    JsonArray ws = doc["data"]["result"]["ws"];
    for (JsonObject word : ws)
    {
            
            
      
      int bg = word["bg"];
      const char *w = word["cw"][0]["w"];
      stttext += w;
    }
    if (doc["data"]["status"] == 2)
    {
            
            
       // 收到结束标志
      sttste = 1;
      Serial.print("语音识别:");
      Serial.println(stttext);
    }
  });

}


void loop()
{
            
            
      

  if (Serial.available())
  {
            
            
      
    // delay(20);
    if (Serial.read() == '1')
    {
            
            
      
      stttext = "";
      Serial.println("Recording...");
      size_t bytes_read = 0;
      recordingSize = 0;
      // 分配 pcm_data
      pcm_data = (int16_t *)ps_malloc(BUFFER_SIZE * sizeof(int16_t));
      if (!pcm_data)
      {
            
            
      
        Serial.println("Failed to allocate memory for pcm_data from PSRAM");
        return;
      }

      // uint16_t x = 0, y = 0;
      while (recordingSize < recordTimeSeconds* SAMPLE_RATE)
      {
            
            
       // 开始循环录音,将录制结果保存在pcm_data中
        esp_err_t result = i2s_read(I2S_PORT_0, audioData, sizeof(audioData), &bytes_read, portMAX_DELAY);
        memcpy(pcm_data + recordingSize, audioData, bytes_read);
        recordingSize += bytes_read / 2;
      }

      Serial.printf("Total bytes read: %d\n", recordingSize);
      Serial.println("Recording complete.");
      STTsend(); // STT请求开始
      free(pcm_data);
    }
  }

  if (client.available())
  {
            
            
      
    client.poll();
  }
  
  delay(50);
}


需要自己替换wifi和api参数

const char *ssid = "IQOO";
const char *password = "12345678";

// 讯飞STT 的key
String STTAPPID = "99d038";
const char *STTAPISecret = "YzMxNGRkODJlNjVjMDUZTc5MjFh";
const char *STTAPIKey = "aa09fe34b2c3d8c327222463f8e74";

Base64_Arturo.cpp

/*
Copyright (C) 2016 Arturo Guadalupi. All right reserved.

This library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2.1 of the License, or (at your option) any later version.

This library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.
*/

#include "Base64_Arturo.h"
#include <Arduino.h>
#if (defined(__AVR__))
#include <avr/pgmspace.h>
#else
#include <pgmspace.h>
#endif

const char PROGMEM _Base64AlphabetTable[] = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
		"abcdefghijklmnopqrstuvwxyz"
		"0123456789+/";

int Base64Class::encode(char *output, char *input, int inputLength) {
            
            
      
	int i = 0, j = 0;
	int encodedLength = 0;
	unsigned char A3[3];
	unsigned char A4[4];

	while(inputLength--) {
            
            
      
		A3[i++] = *(input++);
		if(i == 3) {
            
            
      
			fromA3ToA4(A4, A3);

			for(i = 0; i < 4; i++) {
            
            
      
				output[encodedLength++] = pgm_read_byte(&_Base64AlphabetTable[A4[i]]);
			}

			i = 0;
		}
	}

	if(i) {
            
            
      
		for(j = i; j < 3; j++) {
            
            
      
			A3[j] = '\0';
		}

		fromA3ToA4(A4, A3);

		for(j = 0; j < i + 1; j++) {
            
            
      
			output[encodedLength++] = pgm_read_byte(&_Base64AlphabetTable[A4[j]]);
		}

		while((i++ < 3)) {
            
            
      
			output[encodedLength++] = '=';
		}
	}
	output[encodedLength] = '\0';
	return encodedLength;
}

int Base64Class::decode(char * output, char * input, int inputLength) {
            
            
      
	int i = 0, j = 0;
	int decodedLength = 0;
	unsigned char A3[3];
	unsigned char A4[4];


	while (inputLength--) {
            
            
      
		if(*input == '=') {
            
            
      
			break;
		}

		A4[i++] = *(input++);
		if (i == 4) {
            
            
      
			for (i = 0; i <4; i++) {
            
            
      
				A4[i] = lookupTable(A4[i]);
			}

			fromA4ToA3(A3,A4);

			for (i = 0; i < 3; i++) {
            
            
      
				output[decodedLength++] = A3[i];
			}
			i = 0;
		}
	}

	if (i) {
            
            
      
		for (j = i; j < 4; j++) {
            
            
      
			A4[j] = '\0';
		}

		for (j = 0; j <4; j++) {
            
            
      
			A4[j] = lookupTable(A4[j]);
		}

		fromA4ToA3(A3,A4);

		for (j = 0; j < i - 1; j++) {
            
            
      
			output[decodedLength++] = A3[j];
		}
	}
	output[decodedLength] = '\0';
	return decodedLength;
}

int Base64Class::encodedLength(int plainLength) {
            
            
      
	int n = plainLength;
	return (n + 2 - ((n + 2) % 3)) / 3 * 4;
}

int Base64Class::decodedLength(char * input,  int inputLength) {
            
            
      
	int i = 0;
	int numEq = 0;
	for(i = inputLength - 1; input[i] == '='; i--) {
            
            
      
		numEq++;
	}

	long long result = ((6LL * inputLength) / 8) - numEq;
	//return ((6 * inputLength) / 8) - numEq;
	return static_cast<int>(result);
}

//Private utility functions
inline void Base64Class::fromA3ToA4(unsigned char * A4, unsigned char * A3) {
            
            
      
	A4[0] = (A3[0] & 0xfc) >> 2;
	A4[1] = ((A3[0] & 0x03) << 4) + ((A3[1] & 0xf0) >> 4);
	A4[2] = ((A3[1] & 0x0f) << 2) + ((A3[2] & 0xc0) >> 6);
	A4[3] = (A3[2] & 0x3f);
}

inline void Base64Class::fromA4ToA3(unsigned char * A3, unsigned char * A4) {
            
            
      
	A3[0] = (A4[0] << 2) + ((A4[1] & 0x30) >> 4);
	A3[1] = ((A4[1] & 0xf) << 4) + ((A4[2] & 0x3c) >> 2);
	A3[2] = ((A4[2] & 0x3) << 6) + A4[3];
}

inline unsigned char Base64Class::lookupTable(char c) {
            
            
      
	if(c >='A' && c <='Z') return c - 'A';
	if(c >='a' && c <='z') return c - 71;
	if(c >='0' && c <='9') return c + 4;
	if(c == '+') return 62;
	if(c == '/') return 63;
	return -1;
}

Base64Class Base64_Arturo;

Base64_Arturo.h

/*
Copyright (C) 2016 Arturo Guadalupi. All right reserved.

This library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2.1 of the License, or (at your option) any later version.

This library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.
*/

#ifndef _BASE64_H
#define _BASE64_H

class Base64Class{
            
            
      
  public:
    int encode(char *output, char *input, int inputLength);
    int decode(char * output, char * input, int inputLength);
    int encodedLength(int plainLength);
    int decodedLength(char * input, int inputLength);

  private:
    inline void fromA3ToA4(unsigned char * A4, unsigned char * A3);
    inline void fromA4ToA3(unsigned char * A3, unsigned char * A4);
    inline unsigned char lookupTable(char c);
};
extern Base64Class Base64_Arturo;

#endif // _BASE64_H

以下是该代码的简要功能介绍:

1. 核心功能

这是一个基于ESP32-S3的语音识别系统,主要实现:

  • 通过麦克风录制音频(I2S接口)
  • 连接讯飞开放平台进行语音转文字(STT)
  • 网络时间同步(NTP)
  • WS2812 LED状态指示
  • 串口交互控制

2. 主要模块

  1. 硬件配置

    • WS2812 LED(GPIO48)
    • I2S麦克风(GPIO17,3,18)
    • WiFi网络连接
  2. 音频处理

    • 使用I2S接口采集音频数据
    • 支持16kHz采样率、16位深PCM格式
    • 可录制3秒音频(可调recordTimeSeconds
  3. 语音识别

    • 通过WebSocket连接讯飞语音识别服务
    • 支持实时分段传输音频数据
    • 采用Base64编码和HMAC-SHA256加密
  4. 网络服务

    • WiFi连接(2.4GHz)
    • NTP网络时间同步
    • HTTPS/WebSocket通信
  5. 交互控制

    • 通过串口发送’1’触发录音
    • 实时返回识别结果到串口
    • LED状态指示(当前未实现具体逻辑)

3. 工作流程

  1. 上电后连接WiFi和NTP服务器
  2. 初始化音频采集系统
  3. 等待串口输入’1’触发录音
  4. 录制3秒音频存入PSRAM
  5. 分段发送音频到讯飞云服务
  6. 接收识别结果并拼接成完整文本
  7. 通过串口输出识别结果

4. 典型应用场景

  • 语音控制智能设备
  • 语音备忘录系统
  • 实时语音转文字设备
  • IoT设备的语音交互接口

5. 关键技术点

  • 低延迟音频采集:使用I2S DMA传输
  • 大内存管理:采用PSRAM存储音频数据
  • 安全通信:HMAC-SHA256签名验证
  • 实时协议:WebSocket双向通信
  • 语音分段处理:支持多数据包传输

6. 待完善功能

  • LED状态指示逻辑
  • 错误处理机制
  • 本地音频缓存管理
  • 离线语音识别支持

该代码实现了从硬件音频采集到云端语音识别的完整链路,可作为语音交互类IoT设备的基础框架。

4. 接收数据

参考以下烧录配置platformio.ini

; PlatformIO Project Configuration File
;
;   Build options: build flags, source filter
;   Upload options: custom upload port, speed and extra flags
;   Library options: dependencies, extra library storages
;   Advanced options: extra scripting
;
; Please visit documentation for the other options and examples
; https://docs.platformio.org/page/projectconf.html

[env:esp32s3-cam]
platform = espressif32
board = esp32-s3-devkitc-1
framework = arduino
board_build.arduino.memory_type = dio_opi
board_upload.flash_size = 16MB
board_build.partitions = default_16MB.csv
board_build.mcu = esp32s3
monitor_speed = 115200
upload_speed = 921600
lib_deps = 
	bblanchon/ArduinoJson@^7.3.0
	arduino-libraries/NTPClient@^3.2.1
	gilmaimon/ArduinoWebsockets@^0.5.4
    adafruit/Adafruit NeoPixel @ ^1.12.2
build_flags = 
	-DBOARD_HAS_PSRAM
	-mfix-esp32-psram-cache-issue

串口输入字符“1”文本,没有结束符点击按回车键,然后有3s录音时间。等待百度在线语音识别返回,在上一步的代码中实现了接收数据,这里列一下返回的数据。

............................. CONNECTED
23:28:10:629 -> Epoch Time: 1740439690
23:28:10:629 -> Mon, 24 Feb 2025 23:28:10 GMT
23:28:10:631 -> Setup I2S ...
23:29:30:973 ---- 已发送 utf8 编码消息: "1" ----
23:29:31:023 -> Recording...
23:29:33:623 -> Total bytes read: 48640
23:29:33:623 -> Recording complete.
23:29:33:686 -> Epoch Time: 1740439774
23:29:33:686 -> Mon, 24 Feb 2025 23:29:34 GMT
23:29:33:692 -> Connecting to server.
23:29:33:857 -> Connected!
23:29:33:862 -> input: 0 , status: 0 
23:29:35:172 -> input: 9 , status: 2 
23:29:35:517 -> Got Message: {
            
            
      "code":0,"message":"success","sid":"iat000decb8@dx1953893f1937024802","data":{
            
            
      "result":{
            
            
      "sn":1,"ls":false,"bg":0,"ed":0,"vad":{
            
            
      "ws":[{
            
            
      "bg":61,"ed":278,"eg":48.86}]},"ws":[{
            
            
      "bg":25,"cw":[{
            
            
      "w":"你好","sc":0}]},{
            
            
      "bg":73,"cw":[{
            
            
      "sc":0,"w":"呀"}]},{
            
            
      "bg":97,"cw":[{
            
            
      "sc":0,"w":","}]},{
            
            
      "bg":97,"cw":[{
            
            
      "sc":0,"w":"我"}]},{
            
            
      "bg":121,"cw":[{
            
            
      "sc":0,"w":"是"}]},{
            
            
      "bg":137,"cw":[{
            
            
      "w":"鹏","sc":0}]},{
            
            
      "bg":161,"cw":[{
            
            
      "w":"鹏","sc":0}]}]},"status":0}}
23:29:35:555 -> Got Message: {
            
            
      "code":0,"message":"success","sid":"iat000decb8@dx1953893f1937024802","data":{
            
            
      "result":{
            
            
      "bg":0,"ed":0,"ws":[{
            
            
      "cw":[{
            
            
      "sc":0,"w":"。"}],"bg":188}],"sn":2,"ls":true},"status":2}}
23:29:35:572 -> 语音识别:你好呀,我是鹏鹏。
23:29:51:276 ---- 已发送 utf8 编码消息: "1" ----
23:29:51:319 -> Recording...
23:29:53:843 -> Total bytes read: 48640
23:29:53:843 -> Recording complete.
23:29:53:846 -> Epoch Time: 1740439794
23:29:53:849 -> Mon, 24 Feb 2025 23:29:54 GMT
23:29:53:852 -> Connecting to server.
23:29:54:091 -> Connected!
23:29:54:096 -> input: 0 , status: 0 
23:29:55:339 -> input: 9 , status: 2 
23:29:55:604 -> Got Message: {
            
            
      "code":0,"message":"success","sid":"iat000dcbb5@dx195389440c7a140802","data":{
            
            
      "status":0,"result":{
            
            
      "sn":1,"ls":false,"bg":0,"ed":0,"vad":{
            
            
      "ws":[{
            
            
      "bg":71,"ed":300,"eg":46.19}]},"ws":[{
            
            
      "bg":25,"cw":[{
            
            
      "sc":0,"w":"快"}]},{
            
            
      "bg":45,"cw":[{
            
            
      "sc":0,"w":"到"}]},{
            
            
      "bg":65,"cw":[{
            
            
      "sc":0,"w":"晚上"}]},{
            
            
      "bg":105,"cw":[{
            
            
      "sc":0,"w":"12点"}]},{
            
            
      "bg":169,"cw":[{
            
            
      "sc":0,"w":"休息"}]},{
            
            
      "bg":209,"cw":[{
            
            
      "sc":0,"w":"���"}]}]}}}
23:29:55:640 -> Got Message: {
            
            
      "code":0,"message":"success","sid":"iat000dcbb5@dx195389440c7a140802","data":{
            
            
      "result":{
            
            
      "sn":2,"ls":true,"bg":0,"ed":0,"ws":[{
            
            
      "bg":220,"cw":[{
            
            
      "sc":0,"w":"。"}]}]},"status":2}}
23:29:55:656 -> 语音识别:快��晚上12点休息吧。

在这里插入图片描述

响应速度超级棒2s,数据发送成功则会返回正确的识别数据,当然声音信号不好时返回的语音识别也会不准确。

5. 总结

本文使用ESP32S3开发板接入讯飞实现在线语音识别。自带麦克风模块用做语音输入,通过串口发送字符“1”来控制数据的采集和上传。从而实现对外部世界进行感知,充分认识这个有机与无机的环境,科学地合理地进行创作和发挥效益,然后为人类社会发展贡献一点微薄之力。🤣🤣🤣

  1. 我会持续更新对应专栏博客,非常期待你的三连!!!🎉🎉🎉
  2. 如果鹏鹏有哪里说的不妥,还请大佬多多评论指教!!!👍👍👍
  3. 下面有我的🐧🐧🐧群推广,欢迎志同道合的朋友们加入,期待与你的思维碰撞😘😘😘

参考文献:ESP32直接对话大语言模型人工智能语音助手