RAG (Retrieval-Augmented Generation) System using LangChain with Streamlit and Chainlit UI Interface

This project is a Retrieval-Augmented Generation (RAG) system designed to query customized documents such as PDFs and can be scaled to accommodate other types of documents. The system utilizes Streamlit and Chainlit to build app UI interface and Pinecone and Chroma as the vector database.

Project Overview

The flow of the project is illustrated below:

graph TD
    A[Query] --> B[Embedding Model] --> C[Vector Database]
    G[Documents] --> H[Documents Loader] --> I[Splitter] --> B[Embedding Model]
    C[Vector Database] --> D[Context + Question] --> E[LLM] --> F[Answer]
    A[Query] --> D[Context + Question]

Components

Document Loaders: Responsible for loading documents of various types (HTML, PDF, TXT) into the system.
Splitter: Splits the loaded documents into manageable chunks for processing.
Embedding Model: Converts document chunks into embeddings.
VectorDB (Currently using Chroma and Pinecone): Stores and retrieves embeddings efficiently.
Query: User's input query that needs to be answered.
Context and Question: The context retrieved from the VectorDB and the user's question.
LLM (Large Language Model): Processes the context and question to generate an answer.

※ Interface of this flow was build by Streamlit and Chainlit

Demo

Smart Document Query System

Effortlessly search and retrieve information from your documents using our advanced RAG system. The sources is shown below the answer.

Smart Document Query System - Streamlit Interface

Smart Document Query System - Chainlit Interface

Gemini Assistant Chatbot

Ask any question and receive answers from our powerful AI.

Project Structure

.
├── LICENSE                             # MIT license
├── README.md                           # Project overview and instructions
├── SmartDocumentQueryST.py             # Main script for the Smart Document Query system - Streamlit Interface
├── SmartDocumentQueryCL.py             # Main script for the Smart Document Query system - Chainlit Interface
├── data_source
│   ├── download.py                     # Script to download data sources
│   ├── pdf_file                        # Directory containing PDF files for the project
│   ├── pdf_file_test
│   │   └── Attention_Is_All_You_Need.pdf  # Sample PDF file for testing
│   └── pdf_file_test
│   │   └── YOLOv10_Tutorials.pdf       # Document for YOLOv10 (using to test chainlit interface)
├── icons
│   └── chatbot.png                     # Icon for the chatbot
├── images
│   ├── demo_smart_document_query_st_ui.png   # Image for the Smart Document Query system demo - Streamlit Interface
│   ├── demo_smart_document_query_cl_ui.png   # Image for the Smart Document Query system demo - Chainlit Interface
│   └── gemini_assistant_chatbot_01.png       # Image for the Gemini Assistant Chatbot demo
├── notebooks
│   └── chatbot_using_rag.ipynb        # Notebooks to build RAG using Vicuna OS model
├── pages
│   └── 1_GeminiChatBot.py              # Streamlit page for the Gemini Assistant Chatbot
├── requirements.txt                    # List of project dependencies
└── src
    ├── build_db.py                     # Script to build the vector database by Pinecone
    ├── const.py                        # File containing project constants
    ├── document_loaders
    │   ├── base.py                     # Base class for document loaders
    │   └── pdf.py                      # PDF document loader implementation
    ├── logger
    │   └── simple_logger.py            # Simple logger implementation
    ├── model
    │   └── llms.py                     # Script containing LLM-related functionalities
    ├── retriever.py                    # Script for the retriever functionality
    ├── splitters
    │   └── text_splitter.py            # Script for splitting text into chunks
    └── vector_db
        ├── base.py                     # Base class for vector databases
        ├── chroma_db.py                # Chroma vector database class
        └── pinecone_db.py              # Pincecone vector databases class

How to Run

Prerequisites

Python 3.8+
Streamlit
Chainlit
LangChain
Pinecone

Installation

Clone the repository:

git clone https://github.com/nguyenhads/rag_system_with_ui_interface.git
cd rag_system_with_ui_interface

Create a virtual environment and install the required packages:

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Set up Pinecone:
- Sign up at Pinecone and get your API key.
- Create an index on Pinecone for storing document embeddings.
- Update src\const.py with your INDEX_NAME, NAME_SPACE (optinal)

Set up your environment variables:

Create a .env file in the project root and add your OpenAI and Pinecone API keys (examples provided in .env.sample)

# API keys
OPENAI_API_KEY=your-openai-api-key
PINECONE_API_KEY=your-pinecone-api-key
GOOGLE_API_KEY=your-google-api-key

#Python Path
PYTHONPATH=/path/to/src/folder

Build a vector store DB using Pinecone

python src/build_db.py <folder_containing_pdf_files>

Running a Streamlit app or Chainlit app

streamlit run SmartDocumentQueryST.py

chainlit run SmartDocumentQueryCL.py

Acknowledgements

License

This project is licensed under the MIT License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RAG (Retrieval-Augmented Generation) System using LangChain with Streamlit and Chainlit UI Interface

Project Overview

Components

Demo

Smart Document Query System

Smart Document Query System - Streamlit Interface

Smart Document Query System - Chainlit Interface

Gemini Assistant Chatbot

Project Structure

How to Run

Prerequisites

Installation

Acknowledgements

License

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.github/workflows		.github/workflows
data_source		data_source
icons		icons
images		images
notebooks		notebooks
pages		pages
src		src
.env.sample		.env.sample
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
SmartDocumentQueryCL.py		SmartDocumentQueryCL.py
SmartDocumentQueryST.py		SmartDocumentQueryST.py
chainlit.md		chainlit.md
requirements.txt		requirements.txt
sonar-project.properties		sonar-project.properties

License

nguyenhads/rag_system_with_ui_interface

Folders and files

Latest commit

History

Repository files navigation

RAG (Retrieval-Augmented Generation) System using LangChain with Streamlit and Chainlit UI Interface

Project Overview

Components

Demo

Smart Document Query System

Smart Document Query System - Streamlit Interface

Smart Document Query System - Chainlit Interface

Gemini Assistant Chatbot

Project Structure

How to Run

Prerequisites

Installation

Acknowledgements

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages