In this project, We will using LLMs LLaVA and RAG to build a VQA Chatbot.
Results are available at Jira
- Clone project:
git clone ...
cd Receipts_VQA/
- Create conda environments:
conda create --name < ENV_NAME > python=3.11 -y
conda activate < ENV_NAME >
-
Install torch
-
Run this command to install dependenies in the requirements.txt file
pip install -r requirements.txt
- Run the streamlit server
streamlit run app.py
-
Access the application in your browser at [http://localhost:8501].
-
Start chatting with the assistant!
The app as follows:
-
The user enters an image in the upload image field.
-
User enters a question about uploaded image.
-
User messages are sent to the OCR and LLaVA model for processing.
-
The user's input, along with the chat history, is used to generate a response.
-
The LLaVA model generates a response based on the patterns it learned during training.