Skip to content

Commit

Permalink
Update Documentation with Multimodal
Browse files Browse the repository at this point in the history
  • Loading branch information
kimjammer committed Jun 12, 2024
1 parent c7afcad commit fbf6a57
Showing 1 changed file with 12 additions and 0 deletions.
12 changes: 12 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ The original version was also created in only 7 days, so it is not exactly very
- Flexible LLM - Load any model into text-generation-webui (tested) or use any openai-compatible endpoint (not tested).
- Memory/RAG - Long-term (persists across restarts) memories can be manually added, but they will also be
automatically generated as the AI talks. (See memories/readme.md for details)
- Vision/Multimodality - Automatic screenshots and prompting of multimodal models. (See [Neuro-LLM-Server](https://github.com/kimjammer/Neuro-LLM-Server))

## Architecture

Expand All @@ -27,6 +28,14 @@ with the LLM. text-generation-webui and the LLM must be installed and started se
Alternatively, you can load any other model into text-generation-webui or modify constants.py to point to any other
openapi compatible endpoint. Note that this project uses some parameters not available on the official OpenAI API.

### Multimodal LLM

Like with the text-only LLM, you can use any openai api compatible endpoint (Be careful, many openai-like hosting servers
are not actually compatible) to access any multimodal model. However, I'm using [MiniCPM-Llama3-V-2_5-int4](https://github.com/OpenBMB/MiniCPM-V)
on my custom and extremely jank [Neuro-LLM-Server](https://github.com/kimjammer/Neuro-LLM-Server) to host the model.
This model has amazing high resolution input, great OCR, and is based on Llama 3 so the output is very similar to the
text-only mode. The int4 quantized version uses approximately 8GB of VRAM.

### STT

This project uses the excellent [KoljaB/RealtimeSTT](https://github.com/KoljaB/RealtimeSTT), which can transcribe an
Expand Down Expand Up @@ -120,6 +129,9 @@ First, install the CUDA 11.8 version of pytorch 2.2.2.

Install requirements.txt.

Use `pip list` to confirm that you still have the 2.2.2+cu118 version of torch and torchaudio still installed. If
it got overridden, use the first command to install it again.

DeepSpeed (For TTS) will need to be installed separately. I was using instructions
from [AllTalkTTS](https://github.com/erew123/alltalk_tts?#-deepspeed-installation-options), and using their
[provided wheels](https://github.com/erew123/alltalk_tts/releases/tag/DeepSpeed-14.0).
Expand Down

0 comments on commit fbf6a57

Please sign in to comment.