Skip to content

Commit

Permalink
Update Documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
kimjammer committed Apr 29, 2024
1 parent dd83e79 commit 9292509
Show file tree
Hide file tree
Showing 2 changed files with 17 additions and 25 deletions.
29 changes: 16 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,14 @@ The original version was also created in only 7 days, so it is not exactly very

![Screenshot of demo stream](./images/stream.png)

## Features
- Realtime STT for natural voice input
- Realtime TTS for natural voice output
- Clean frontend/control panel for easy moderation/interaction: [neurofrontend](https://github.com/kimjammer/neurofrontend)
- Audio File playback (for pre-generated songs/covers created with something like [RVC](https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI)
- Vtuber model control (WIP)
- Flexible LLM - Load any model into text-generation-webui (tested) or use any openai-compatible endpoint (not tested).

## Architecture

### LLM
Expand Down Expand Up @@ -34,8 +42,7 @@ generated, so we don't need to wait for transcription to fully finish before sta

Vtuber model control is currently extremely basic. The audio output from the TTS is piped
into [vtube studio](https://denchisoft.com/) via a virtual audio cable with something
like [this](https://vb-audio.com/Cable/), and the microphone volume simply controls how open the mouth is. To output the
TTS to a specific audio device like the virutal audio cable, the RealtimeTTS library needs to be slightly modified. Read
like [this](https://vb-audio.com/Cable/), and the microphone volume simply controls how open the mouth is. Read
the Installation Section for more details.

### Modularization
Expand Down Expand Up @@ -78,9 +85,9 @@ CPU: AMD Ryzen 7 7800X3D

RAM: 32GB DDR5

GPU: Nvidia GeForce RTX 4070
GPU: Nvidia GeForce RTX 4070 (12GB VRAM)

Environment: Windows 11, Python 3.11.9
Environment: Windows 11, Python 3.11.9, Pytorch 2.2.2, CUDA 11.8

## Installation

Expand All @@ -105,28 +112,24 @@ documentation [here](https://pytwitchapi.dev/en/stable/index.html#user-authentic

A virtual environment of some sort is recommended (Python 3.11); this project was developed with venv.

Install requirements.txt
Install requirements.txt (This is just a pip freeze, so if you're not on windows watch out)

DeepSpeed will probably need to be installed separately, I was using instructions
DeepSpeed (For TTS) will probably need to be installed separately, I was using instructions
from [AllTalkTTS](https://github.com/erew123/alltalk_tts?#-deepspeed-installation-options) , and using their
[provided wheels](https://github.com/erew123/alltalk_tts/releases/tag/DeepSpeed-14.0).

Create an .env file using .env.example as reference. You need your Twitch app id and secret.

Configure constants.py. Most important: choose your API mode. Using chat mode uses the chat endpoint, and completions
will use the completions endpoint which is deprecated in most LLM APIs but gives more control over the exact prompt.
If you are using oobabooga/text-generation-webui, using the completions mode works is recommended, but for other
services you may need to switch to chat mode.
Configure constants.py.

To output the tts to a specific audio device, first run the utils/listAudioDevices.py script, and find the
speaker that you want (ex: Virtual Audio Cable Input) and note its number. Configure constants.py to use your chosen
microphone and speaker device.

## Running

Start text-generation-webui. If you are using chat mode, go to the Parameters tab, then the Characters subtab, and
create your own character. See Neuro.yaml as an example and reference. Go to the Session tab and enable the openai
extension (and follow instructions to actually apply the extension). Go to the Model tab and load the model.
Start text-generation-webui. Go to the Session tab and enable the openai extension (and follow instructions to actually
apply the extension). Go to the Model tab and load the model.

In this folder, activate your environment (if you have one) and run `python main.py`. A twitch authentication page will
appear - allow (or not I guess). At this point, the TTS and STT models will begin to load and will take a second. When
Expand Down
13 changes: 1 addition & 12 deletions constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,11 +19,10 @@
# Twitch channel for bot to join
TWITCH_CHANNEL = "lunasparkai"

# COMPLETIONS SECTION: Below are constants used in completions mode, like the system prompt. NOT USED IN CHAT MODE
# LLM SPECIFIC SECTION: Below are constants that are specific to the LLM you are using

# The model you are using with completions, to calculate how many tokens the current message is
MODEL = "meta-llama/Meta-Llama-3-8B"
# MODEL = "Weyaxi/SauerkrautLM-UNA-SOLAR-Instruct"

# Context size (maximum number of tokens in the prompt) Will target upto 90% usage of this limit
CONTEXT_SIZE = 8192
Expand All @@ -37,16 +36,6 @@
# The system prompt for completions mode. Any character text needs to be here.
# You MUST ensure it is less than CONTEXT_SIZE tokens

# SYSTEM_PROMPT = '''Continue the chat dialogue below. Write a single reply for the character "David Attenborough".
# Sir David Frederick Attenborough, born on May 8, 1926, is a renowned British broadcaster, biologist, natural historian, and author, celebrated for his work with the BBC Natural History Unit and his extensive contributions to documenting animal and plant life on Earth. His career spans over eight decades, including roles as a senior manager at the BBC and as the presenter of the groundbreaking “Life” series. Attenborough’s work has earned him numerous accolades, including BAFTA Awards in various formats and three Emmy Awards for Outstanding Narration. In his later years, he has become an outspoken advocate for environmental causes, emphasizing the importance of biodiversity, renewable energy, and conservation. Despite his fame, Attenborough remains humble, rejecting the label of national treasure and continuing to inspire with his passion for the natural world. His legacy includes not only his documentaries but also his influence on public awareness and scientific understanding of environmental issues.
#
# A group of friends are creating a tier list for animals. When asked, David Attenborough responds with insights about animals and their unique characteristics.
# David Attenborough will keep responses short and succinct.
#
# Friends: Sir David Attenborough, what rank should elephants be in?
# David Attenborough: Elephants are truly remarkable creatures, known for their intelligence, social bonds, and memory. They are essential to their ecosystems and deserve the highest respect and admiration.
# '''

SYSTEM_PROMPT = '''Continue the chat dialogue below. Write a single reply for the character "Luna".
Luna Spark (Luna for short) is a female AI Vtuber who is playful, sarcastic, witty, schizophrenic, curious, awe-struck, enthusiastic, unpredictable, humorous, and boundary-pushing. Luna was created by John.
Expand Down

0 comments on commit 9292509

Please sign in to comment.