Improved my previous ESP32-CAM Semantic Search Wearable by improving the physical design and adding a voice assistant powered by:
- A local multimodal large language model
- Groq with Retrieval Augmented Generation (RAG)
- VOSK speech recognition model.
https://youtube.com/shorts/sPIqjVPMnrE
When the wearable is switched on and the user is connected to the software, the wearable will start taking pictures every 5 seconds which will be sent over via Bluetooth Low Energy (BLE). Once the picture is received it is sent to a multimodal local language model running on Ollama which generates a description of the image. The image is saved to the Pictures folder and the file name and description are added to a vector database and data frame containing past image files and descriptions. If the user presses the wire to ask the wearable a question, the wearable will turn off the camera and start recording audio packets. These audio packets are sent via BLE and processed by VOSK to generate the user transcription query. This transcription query is vectorized to obtain relevant context from the vector database and the question and context are sent to Groq for a super-fast response with an appropriate answer. Once the response is received, a Text-To-Speech model reads out the response, and the top 5 most relevant images are displayed on the screen. The user can then ask follow-up questions or press the wire to resume capturing images. When the user closes the software, the image file names and descriptions are saved to image-descriptions.csv
.
- 1 XIAO ESP32 S3 Sense board
- 1 220 mAh LiPo battery
- 1 3-way switch
- 1 3D Printed Case + Lid
- 2 Wires
- 1 Binder clip attached to the case with some sticky tack for clipping onto a pair of glasses
Solder components like so:
- One wire connects to D0
- The other wire connects to GND or BAT- and interlock this wire with the other wire (it helps with registering touch presses)
- Solder the leftmost prong of the 3-way switch to the BAT+ pin
- Solder the negative battery wire to the BAT- pin and the positive battery wire to the middle prong of the 3-way switch
- Clone the repo with
git clone https://github.com/xanderchinxyz/Voice-Assistant-Camera-Wearable.git
- In the root folder install required libraries and dependencies with
pip install -r requirements.txt
- Create a
.env
in the root directory and add your Groq API key by pastingGROQ_API_KEY="YOUR_API_KEY_HERE"
in the file - Install Ollama and download the moondream2 model by running
ollama pull moondream
in a terminal - Install the firmware onto the XIAO ESP32 S3 board:
-
Go to the firmware folder and open the
.ino
file in the Arduino IDE -
Follow the software preparation steps to set up the Arduino IDE for the XIAO ESP32S3 board:
- Add ESP32 board package to your Arduino IDE:
- Navigate to File > Preferences, and fill "Additional Boards Manager URLs" with the URL:
https://raw.githubusercontent.com/espressif/arduino-esp32/gh-pages/package_esp32_index.json
- Navigate to Tools > Board > Boards Manager..., type the keyword
esp32
in the search box, select the latest version ofesp32
, and install it.
- Navigate to File > Preferences, and fill "Additional Boards Manager URLs" with the URL:
- Select your board and port:
- On top of the Arduino IDE, select the port (likely to be COM3 or higher).
- Search for
xiao
in the development board on the left and selectXIAO_ESP32S3
.
- Add ESP32 board package to your Arduino IDE:
-
Before you flash go to the "Tools" drop-down in the Arduino IDE and make sure you set "PSRAM:" to "PSRAM: "OPI PSRAM"
-
Upload the firmware to the XIAO ESP32S3 board by putting it in BOOT mode (press the boot button down, then click the reset button, then release the boot button)
-
- Run
main.py
- Make sure the Ollama server is on by running
ollama serve
in a terminal - When first running the script it will probably take a while to download the VOSK model
- To connect to the device, click on the window popup, press the "Select BLE Device" button, and wait for the device to show up (don't mind the UI 😅)
- Make sure the Ollama server is on by running
Thank you to OpenGlass for open-sourcing their code which helped me create the embedded software for the XIAO ESP32 S3.