Update Documentation

kimjammer · Apr 29, 2024 · 9292509 · 9292509
1 parent dd83e79
commit 9292509
Show file tree

Hide file tree

Showing 2 changed files with 17 additions and 25 deletions.
diff --git a/README.md b/README.md
@@ -5,6 +5,14 @@ The original version was also created in only 7 days, so it is not exactly very
 
 ![Screenshot of demo stream](./images/stream.png)
 
+## Features
+- Realtime STT for natural voice input
+- Realtime TTS for natural voice output
+- Clean frontend/control panel for easy moderation/interaction: [neurofrontend](https://github.com/kimjammer/neurofrontend)
+- Audio File playback (for pre-generated songs/covers created with something like [RVC](https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI)
+- Vtuber model control (WIP)
+- Flexible LLM - Load any model into text-generation-webui (tested) or use any openai-compatible endpoint (not tested).
+
 ## Architecture
 
 ### LLM
@@ -34,8 +42,7 @@ generated, so we don't need to wait for transcription to fully finish before sta
 
 Vtuber model control is currently extremely basic. The audio output from the TTS is piped
 into [vtube studio](https://denchisoft.com/) via a virtual audio cable with something
-like [this](https://vb-audio.com/Cable/), and the microphone volume simply controls how open the mouth is. To output the
-TTS to a specific audio device like the virutal audio cable, the RealtimeTTS library needs to be slightly modified. Read
+like [this](https://vb-audio.com/Cable/), and the microphone volume simply controls how open the mouth is. Read
 the Installation Section for more details.
 
 ### Modularization
@@ -78,9 +85,9 @@ CPU: AMD Ryzen 7 7800X3D
 
 RAM: 32GB DDR5
 
-GPU: Nvidia GeForce RTX 4070
+GPU: Nvidia GeForce RTX 4070 (12GB VRAM)
 
-Environment: Windows 11, Python 3.11.9
+Environment: Windows 11, Python 3.11.9, Pytorch 2.2.2, CUDA 11.8
 
 ## Installation
 
@@ -105,28 +112,24 @@ documentation [here](https://pytwitchapi.dev/en/stable/index.html#user-authentic
 
 A virtual environment of some sort is recommended (Python 3.11); this project was developed with venv.
 
-Install requirements.txt
+Install requirements.txt (This is just a pip freeze, so if you're not on windows watch out)
 
-DeepSpeed will probably need to be installed separately, I was using instructions
+DeepSpeed (For TTS) will probably need to be installed separately, I was using instructions
 from [AllTalkTTS](https://github.com/erew123/alltalk_tts?#-deepspeed-installation-options) , and using their 
 [provided wheels](https://github.com/erew123/alltalk_tts/releases/tag/DeepSpeed-14.0).
 
 Create an .env file using .env.example as reference. You need your Twitch app id and secret.
 
-Configure constants.py. Most important: choose your API mode. Using chat mode uses the chat endpoint, and completions
-will use the completions endpoint which is deprecated in most LLM APIs but gives more control over the exact prompt.
-If you are using oobabooga/text-generation-webui, using the completions mode works is recommended, but for other 
-services you may need to switch to chat mode.
+Configure constants.py.
 
 To output the tts to a specific audio device, first run the utils/listAudioDevices.py script, and find the
 speaker that you want (ex: Virtual Audio Cable Input) and note its number. Configure constants.py to use your chosen
 microphone and speaker device.
 
 ## Running
 
-Start text-generation-webui. If you are using chat mode, go to the Parameters tab, then the Characters subtab, and 
-create your own character. See Neuro.yaml as an example and reference. Go to the Session tab and enable the openai 
-extension (and follow instructions to actually apply the extension). Go to the Model tab and load the model.
+Start text-generation-webui. Go to the Session tab and enable the openai extension (and follow instructions to actually
+apply the extension). Go to the Model tab and load the model.
 
 In this folder, activate your environment (if you have one) and run `python main.py`. A twitch authentication page will
 appear - allow (or not I guess). At this point, the TTS and STT models will begin to load and will take a second. When

diff --git a/constants.py b/constants.py
@@ -19,11 +19,10 @@
 # Twitch channel for bot to join
 TWITCH_CHANNEL = "lunasparkai"
 
-# COMPLETIONS SECTION: Below are constants used in completions mode, like the system prompt. NOT USED IN CHAT MODE
+# LLM SPECIFIC SECTION: Below are constants that are specific to the LLM you are using
 
 # The model you are using with completions, to calculate how many tokens the current message is
 MODEL = "meta-llama/Meta-Llama-3-8B"
-# MODEL = "Weyaxi/SauerkrautLM-UNA-SOLAR-Instruct"
 
 # Context size (maximum number of tokens in the prompt) Will target upto 90% usage of this limit
 CONTEXT_SIZE = 8192
@@ -37,16 +36,6 @@
 # The system prompt for completions mode. Any character text needs to be here.
 # You MUST ensure it is less than CONTEXT_SIZE tokens
 
-# SYSTEM_PROMPT = '''Continue the chat dialogue below. Write a single reply for the character "David Attenborough".
-# Sir David Frederick Attenborough, born on May 8, 1926, is a renowned British broadcaster, biologist, natural historian, and author, celebrated for his work with the BBC Natural History Unit and his extensive contributions to documenting animal and plant life on Earth. His career spans over eight decades, including roles as a senior manager at the BBC and as the presenter of the groundbreaking “Life” series. Attenborough’s work has earned him numerous accolades, including BAFTA Awards in various formats and three Emmy Awards for Outstanding Narration. In his later years, he has become an outspoken advocate for environmental causes, emphasizing the importance of biodiversity, renewable energy, and conservation. Despite his fame, Attenborough remains humble, rejecting the label of national treasure and continuing to inspire with his passion for the natural world. His legacy includes not only his documentaries but also his influence on public awareness and scientific understanding of environmental issues.
-#
-# A group of friends are creating a tier list for animals. When asked, David Attenborough responds with insights about animals and their unique characteristics.
-# David Attenborough will keep responses short and succinct.
-#
-# Friends: Sir David Attenborough, what rank should elephants be in?
-# David Attenborough: Elephants are truly remarkable creatures, known for their intelligence, social bonds, and memory. They are essential to their ecosystems and deserve the highest respect and admiration.
-# '''
-
 SYSTEM_PROMPT = '''Continue the chat dialogue below. Write a single reply for the character "Luna".
 Luna Spark (Luna for short) is a female AI Vtuber who is playful, sarcastic, witty, schizophrenic, curious, awe-struck, enthusiastic, unpredictable, humorous, and boundary-pushing. Luna was created by John.