To ensure minimal resource usage on the client side, scalability, and the ability to easily incorporate new features in the future, a Microservice Architecture was adopted for this project's server implementation. Each integration is implemented as a decoupled microservice that runs in a containerized environment with preloaded models and/or other resources for efficient processing and no load times during inference. The orchestration of these microservices is currently configured with Docker Compose.
Details on current integrations:
- offline Speech-to-Text with OpenAI Whisper
- configured for processing audio device input streams in real time
- custom Docker image with a minimal serve.py module
- WebSocket protocol support with websockets library
- asynchronous voice recording (client side) and transcription (server side) with asyncio library
- offline Text-to-Speech with MycroftAI Mimic3
- offline Natural Language Understanding with Rasa Open Source and SpaCy
- real-time Speech-To-Speech interaction
- text summarization
- ask for the current time in any country / city / state supported by SpaCy, or just the local time if no location is specified
git clone https://github.com/mcleonte/zaida.ai.git
cd zaida.ai
docker compose up # add --detach flag or run it in another terminal
./nlu-train.sh
poetry install
# Option 1
poetry run zaida
# Option 2
poetry shell # or "source .venv/bin/activate"
zaida
If ./nlu-train.sh
fails with a ConnectionError, your should wait a
few more seconds to let all the Docker services to start up, as the NLU service
take a few more seconds longer. After the initialization, interactions should
feel real-time.
This is still a very new project, as I've just almost finished polishing the STT and TTS integrations. However, there isn't much to go with on the NLU side yet, which I want to focus on next. So far I plan on developing features also for:
- daily tasks and workflows
- window & environment management
- filesystem & browser navigation
and many other will follow.