Developed Financial Reports (pdf) Interface for Analysts using: LangChain (deeplearning.ai), Gen AI (ChatGPT-4) to pull information from documents, and with “Dashboard” created using Gradio.
-
Problem statement: Use Gen AI to develop search and chat functionality to pull relevant information from a document(s).
-
Model structure: the key steps in the model construction are: Set up/Create questions set database → Upload PDF and generate embeddings → Summarize the Contents → Questions box → Retrieval of pre-defined Questions → Get answers to the pre-defined questions.
This project leverages the power of OpenAI's GPT-4 to create a chatbot capable of analyzing contents from PDF documents. The model utilizes various libraries and APIs to enhance its functionality, allowing users to upload PDFs, generate embeddings, retrieve predefined questions, and receive answers.
• Credits • Installation • Overview • Project Structure • Main Steps • Database Management • Document Processing and Embedding Generation • Gradio Interface • Challenges and Lessons Learned • Potential Enhancements
- lekkalar for the ChatGPT PDF model via Hugging Face.
- Gradio API for creating the frontend interface.
This chatbot is designed to interact with PDF documents. Users can upload PDF files, which will be processed to extract content. The extracted content can then be queried using predefined questions or user-defined inputs. The system is built using various tools, including OCR for text extraction and vector stores for efficient querying.
The code is structured into several functions and classes that manage different parts of the process:
- Database Management: Functions that handle the connection to SQLite databases to store and retrieve questions. Document Processing and Embedding Generation: Functions that load PDF documents, convert them to embeddings, and prepare them for querying.
- Gradio Interface: Code to create an interactive web interface using Gradio.
-
Upload PDF and Generate Embeddings: Users can upload a PDF file. The application extracts content from the PDF, generates embeddings using OpenAI's models, and stores them for fast retrieval.
-
Summarize the Contents: The application will summarize the uploaded PDF's contents, making it easier for users to understand the key information.
-
Questions Box for Analyst: Analysts can type their questions directly into the interface and receive answers generated by GPT-4.
-
Retrieval of Pre-defined Questions: A feature to retrieve a list of pre-defined questions from the database, allowing users to quickly access commonly asked queries.
-
Get Answer(s) to Pre-defined Questions: Once the user selects any predefined question set, the system retrieves the relevant questions and provides answers based on the uploaded PDF content.
Create DB Connection and Tables This component handles all database interactions. The following functions are defined:
- create_db_connection: Establishes a connection to an SQLite database.
- create_sqlite_table: Creates tables for storing question sets if they do not already exist.
- load_master_questionset_into_sqlite: Loads predefined questions from a master list into the SQLite database for easy access.
Question Set Management Functions are defined to create and populate the master question list for different document types, allowing the chatbot to pull contextually relevant questions for different PDF uploads.
Load PDF and Generate Embeddings This section contains the core logic for processing the uploaded PDFs. Key functions include:
- load_pdf_and_generate_embeddings: Loads the PDF, applies OCR if necessary, splits the content into usable text chunks, and generates embeddings that can be searched against.
- ocr_converter: Uses OCRmyPDF to ensure text is readable and embedded properly.
The application relies on several libraries, such as LangChain and OpenAI, to create embeddings and respond to user queries.
The Gradio API is used to create an interactive user interface, which consists of:
- A file upload section for PDFs.
- Text boxes for entering queries.
- Dropdowns to select pre-defined question sets.
- Dataframe to display the fields and retrieved questions.
The complete interface allows users to upload documents, ask questions, and retrieve stored Q&A sets seamlessly.
Some Lessons
- Break down the requirements/problem question before starting the model development.
- Python, has proved very useful.
Challenges
- Deprecated tools can slow down development.
- Chat GPT requires subscription
- Searching for relevant existing models can take some trial and error, and some time.
Communities and Users Forum
- Using Open Source components in building a model requires active participation in communities.
- Many questions and “errors” are likely to have been encountered by others before.
Summary
- The exercise opened up many areas for further analysis identified.
- Tools and tips are available if they are searched for.
Improved questions set
- Financial Reports are relatively standard. However, many organizations use slightly different terminologies for the same concept
- The model can be developed such that minor differences in terms will not limit the friendly interface.
Application Challenges
- Subscription to Google Colab Pro helped in speedy development of the model.
- More fine-tuning, more rigorous evaluation, and moving beyond proof-of-concept.