Skip to content
Yi-Ting Chiu edited this page Dec 23, 2024 · 8 revisions

Welcome

This is Open-LLM-VTuber, an application that allows you to talk to (and interrupt) any LLM by voice (hands-free) locally with a Live2D talking face.

⚠️ This project is in its early stages and is currently under active development. Features are unstable, code is messy, and breaking changes will occur. The main goal of this stage is to build a minimum viable prototype using technologies that are easy to integrate.

⚠️ If you want to run this program on a server and access it remotely on your laptop, the microphone on the front end will only launch in a secure context (a.k.a. https or localhost). See MDN Web Doc. Therefore, you might want to configure https with a reverse proxy or launch the front end locally and connect to the server via websocket (untested). Open the static/index.html with your browser and set the ws URL on the page.

⚠️ This project is not very easy to set up at this moment. The modularity and the fact that this is a Python project that uses a ton of AI stuff that needs to be executed locally creates a lot of headaches. I'm working on an installation script with recommended configuration to simplify the process, but it's not complete yet.

Prerequisite

Have ffmpeg installed on your computer.

Python version >= 3.10, < 3.13 (there are currently dependencies installation issues in Python 3.13. If you encounter that, just go 3.12 or lower and it should work)

Basic knowledge

All of the settings are in conf.yaml. You can (and probably will) do many things there, and there are also comments in that file explaining what those settings mean.

Setup steps

  1. Clone the repo
  2. [optional] Create a virtual environment like conda or venv for this project
  3. Install basic dependencies with pip install -r requirements.txt
  4. Setup the LLM
  5. Setup your desired ASR (Automatic Speech Recognition)
  6. Setup your desired TTS (Text to Speech)
  7. Run it

Step 1: Download the repo

Find a good spot on your computer and clone the repository or download the latest release.

git clone https://github.com/t41372/Open-LLM-VTuber

Nice. now go to GitHub and star this project if you haven't done so or you'll &&Eujehruedjhnoeire4939#pE$

Step 2: [optional] Create a virtual environment for this project

It is optional, yet I highly recommend you to create a virtual environment for this project.

This project was developed using Python 3.10.13. Python 3.11 is tested. Some other versions will probably work, too, but they are untested.

If you don't know what a virtual environment is, here is a quick explanation:

A Python virtual environment (venv) is a folder that contains the Python interpreter, third-party libraries, and other scripts. Venvs are isolated from other virtual environments, so changes to dependencies don't affect other virtual environments or system-wide libraries.

-- dataquest

Why?

The reason why I highly recommend you use a virtual environment for this project is that this will make your life a ton easier. This project uses a lot of dependencies, and dependency conflicts happen very often. Using a virtual environment to isolate them saves your hair.

Venv

If you don't know what conda is, we can use venv, which is built into Python and is pretty nice.

# create a virtual environment
python -m venv open-llm-vtuber

To activate the virtual environment, run the following command:

On Windows

open-llm-vtuber\Scripts\activate

On macOS/Linux

source open-llm-vtuber/bin/activate

or conda

If you know what conda is, then you know what to do. Here is the command I personally use. If you don't know what conda is, I recommend you use venv.

# create a conda environment in the project directory
conda create -p ./.conda python="3.10.4"
# activate this environment
conda activate ./.conda

Step 3: Install basic dependencies

Run the following in the root directory of this project to install the dependencies.

pip install -r requirements.txt # Run this in the project directory

Step 4: Set up the LLM

You need to have Ollama or any other OpenAI-API-Compatible backend ready and running. You can use llama.cpp, vLLM, LM Studio, groq, OpenAI, and so much more.

MemGPT

If you want to use long-term memory with MemGPT, you will set MemGPT as your LLM backend instead of the ones mentioned above. Check out MemGPT section for more information (it's not very easy unless you already know how to run MemGPT, so I recommend you start with ollama or other OpenAI-Compatible LLM backends instead).

Ollama and OpenAI Compatible LLM Backend

Prepare an LLM you like and have a running LLM inference server like ollama.

In conf.yaml file, under the option ollama, you can edit the configuration for all OpenAI Compatible LLM inference backend.

Here is the setting in conf.yaml

#  ============== LLM Backend Settings ===================

# Provider of LLM. Choose either "ollama" or "memgpt" (or "fakellm for debug purposes")
# "ollama" for any OpenAI Compatible backend. "memgpt" requires setup
LLM_PROVIDER: "ollama"


# Ollama & OpenAI Compatible inference backend
ollama:
  BASE_URL: "http://localhost:11434/v1"
  LLM_API_KEY: "somethingelse"
  ORGANIZATION_ID: "org_eternity"
  PROJECT_ID: "project_glass"
  ## LLM name
  MODEL: "llama3.1:latest"
  # system prompt is at the very end of this file
  VERBOSE: False

If you don't use LLM_API_KEY, ORGANIZATION_ID, and PROJECT_ID, just leave them as it is.

If you are so excited right now that you want to try this project without voice interactions, you can set LIVE2D, VOICE_INPUT_ON, and TTS_ON to False in the conf.yaml to talk with the LLM by typing with no voice nor Live2D. Remember to turn them back on later on. Don't change the LIVE2D, VOICE_INPUT_ON, and TTS_ON options. These options were designed for CLI mode, which will be removed in the next major version v1.0.0. The LIVE2D options was deprecated and made useless since v0.2.0, and after the release of v0.4.0, users can now directly interact with text in the browser, which makes the VOICE_INPUT_ON options useless. Using the web frontend with these options may lead to unpredictable outcomes. Luckily, those options along with the CLI mode will be removed in the next major version v1.0.0, and the whole documentation will be rewritten, so yeah, there should be less confusion in the future.

Step 5: Set up Automatic Speech Recognition (ASR)

This project supports many different speech recognition models and providers. Check out the ASR section for installation instructions.

In general, here are the steps to set up speech recognition:

  1. Install the dependencies
  2. Edit the configurations of the ASR you use in conf.yaml. You can usually change the language or model there if supported.
  3. Set ASR_MODEL to the ASR of your choice.

As of writing, this project supports the following ASR:

Some recommendations:

If you don't care it connects to the internet on launch (will be fixed in the future), I recommend FunASR with SenseVoiceSmall. It's very fast and the accuracy is pretty good.

If you want something that works offline, I recommend Faster-Whisper if you have an Nvidia GPU, and Whisper-CPP with coreML accleration if you are using macOS.

You can also use Azure Speech Recognition if you happen to have the API key.

⚠️ If you want to run this application (the server) inside a container or on a remote machine and access the webui with local device, you need to turn MIC_IN_BROWSER to True in the conf.yaml. There are more things you need to consider, and it's at the top of this page.

Step 6: Set up Text-To-Speech (TTS)

Check out TTS section for instruction of setting up the TTS you want.

In general, here are the steps to set up a text-to-speech service:

  1. Install the dependencies
  2. Edit the configurations of the TTS you use in conf.yaml. You can usually change the language or speakers there if supported.
  3. Set TTS_MODEL to the TTS of your choice.

Here are some supported TTS as of writing:

  • py3-tts (Local, it uses your system's default TTS engine)
  • bark (Local, very resource-consuming)
  • CosyVoice (Local, very resource-consuming)
  • MeloTTS (local, fast)
  • Edge TTS (online, no API key required)
  • Azure Text-to-Speech (online, API Key required)

Step 7: Run the program

For now, if you are using live2D and everything we mentioned above, here are the steps to run the program:

  1. Run server.py
  2. Open localhost:12393 with your browser (default but you can change it in conf.yaml)
  3. Run main.py (no longer needed)
  4. Talk to the LLM once the Live2D model is loaded.

If you just want to talk and don't want the Live2D and browser that kind of stuff, you can just run the main.py for cli mode.

Some related settings in conf.yaml you might be interested:

  • Turn off the live2D (and the web UI so you don't need the server.py) at LIVE2D
  • Turn off Speech Recognition and start typing in the terminal at VOICE_INPUT_ON
  • Let the mic listen in the browser instead of the terminal at MIC_IN_BROWSER
  • Turn off TTS at TTS_ON
  • Get TTS speaks everything at once at SAY_SENTENCE_SEPARATELY
  • Change/Edit persona prompt at PERSONA_CHOICE and DEFAULT_PERSONA_PROMPT_IN_YAML
  • Change the host and port that the server is listening to at HOST and PORT
  • and VERBOSE

Some models will be downloaded during your first launch, which may take a while.