GenAI_FinancialReports_Interface_for_Analysts

Developed Financial Reports (pdf) Interface for Analysts using: LangChain (deeplearning.ai), Gen AI (ChatGPT-4) to pull information from documents, and with “Dashboard” created using Gradio.

Problem statement: Use Gen AI to develop search and chat functionality to pull relevant information from a document(s).
Model structure: the key steps in the model construction are: Set up/Create questions set database → Upload PDF and generate embeddings → Summarize the Contents → Questions box → Retrieval of pre-defined Questions → Get answers to the pre-defined questions.

Question and Answer Interface (Chatbot) for Analyst

This project leverages the power of OpenAI's GPT-4 to create a chatbot capable of analyzing contents from PDF documents. The model utilizes various libraries and APIs to enhance its functionality, allowing users to upload PDFs, generate embeddings, retrieve predefined questions, and receive answers.

Credits

lekkalar for the ChatGPT PDF model via Hugging Face.
Gradio API for creating the frontend interface.

Overview

This chatbot is designed to interact with PDF documents. Users can upload PDF files, which will be processed to extract content. The extracted content can then be queried using predefined questions or user-defined inputs. The system is built using various tools, including OCR for text extraction and vector stores for efficient querying.

Project Structure

The code is structured into several functions and classes that manage different parts of the process:

Database Management: Functions that handle the connection to SQLite databases to store and retrieve questions. Document Processing and Embedding Generation: Functions that load PDF documents, convert them to embeddings, and prepare them for querying.
Gradio Interface: Code to create an interactive web interface using Gradio.

Main Steps

Upload PDF and Generate Embeddings: Users can upload a PDF file. The application extracts content from the PDF, generates embeddings using OpenAI's models, and stores them for fast retrieval.
Summarize the Contents: The application will summarize the uploaded PDF's contents, making it easier for users to understand the key information.
Questions Box for Analyst: Analysts can type their questions directly into the interface and receive answers generated by GPT-4.
Retrieval of Pre-defined Questions: A feature to retrieve a list of pre-defined questions from the database, allowing users to quickly access commonly asked queries.
Get Answer(s) to Pre-defined Questions: Once the user selects any predefined question set, the system retrieves the relevant questions and provides answers based on the uploaded PDF content.

Database Management

Create DB Connection and Tables This component handles all database interactions. The following functions are defined:

create_db_connection: Establishes a connection to an SQLite database.
create_sqlite_table: Creates tables for storing question sets if they do not already exist.
load_master_questionset_into_sqlite: Loads predefined questions from a master list into the SQLite database for easy access.

Question Set Management Functions are defined to create and populate the master question list for different document types, allowing the chatbot to pull contextually relevant questions for different PDF uploads.

Document Processing and Embedding Generation

Load PDF and Generate Embeddings This section contains the core logic for processing the uploaded PDFs. Key functions include:

load_pdf_and_generate_embeddings: Loads the PDF, applies OCR if necessary, splits the content into usable text chunks, and generates embeddings that can be searched against.
ocr_converter: Uses OCRmyPDF to ensure text is readable and embedded properly.

The application relies on several libraries, such as LangChain and OpenAI, to create embeddings and respond to user queries.

Gradio Interface

The Gradio API is used to create an interactive user interface, which consists of:

A file upload section for PDFs.
Text boxes for entering queries.
Dropdowns to select pre-defined question sets.
Dataframe to display the fields and retrieved questions.

The complete interface allows users to upload documents, ask questions, and retrieve stored Q&A sets seamlessly.

Challenges and Lessons Learned

Some Lessons

Break down the requirements/problem question before starting the model development.
Python, has proved very useful.

Challenges

Deprecated tools can slow down development.
Chat GPT requires subscription
Searching for relevant existing models can take some trial and error, and some time.

Communities and Users Forum

Using Open Source components in building a model requires active participation in communities.
Many questions and “errors” are likely to have been encountered by others before.

Summary

The exercise opened up many areas for further analysis identified.
Tools and tips are available if they are searched for.

Potential Enhancements

Improved questions set

Financial Reports are relatively standard. However, many organizations use slightly different terminologies for the same concept
The model can be developed such that minor differences in terms will not limit the friendly interface.

Application Challenges

Subscription to Google Colab Pro helped in speedy development of the model.
More fine-tuning, more rigorous evaluation, and moving beyond proof-of-concept.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
README.md		README.md
Required Libraries.txt		Required Libraries.txt
financialreports_analyst_qa_portal.py		financialreports_analyst_qa_portal.py
questions.csv		questions.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GenAI_FinancialReports_Interface_for_Analysts

Question and Answer Interface (Chatbot) for Analyst

Table of Contents

Credits

Overview

Project Structure

Main Steps

Database Management

Document Processing and Embedding Generation

Gradio Interface

Challenges and Lessons Learned

Potential Enhancements

About

Releases

Packages

Languages

tjomole/GenAI_FinancialReports_Interface_for_Analysts

Folders and files

Latest commit

History

Repository files navigation

GenAI_FinancialReports_Interface_for_Analysts

Question and Answer Interface (Chatbot) for Analyst

Table of Contents

Credits

Overview

Project Structure

Main Steps

Database Management

Document Processing and Embedding Generation

Gradio Interface

Challenges and Lessons Learned

Potential Enhancements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages