This repo is aimed to be an implementation of a locally hosted chatbot specifically focused on question answering over the financial data. Built with LangChain and FastAPI.
- Install dependencies:
pip install -r requirements.txt
- Run the app:
make start
- To enable tracing, make sure
langchain-server
is running locally and passtracing=True
toget_chain
inmain.py
. You can find more documentation here.
- To enable tracing, make sure
- Open localhost:9000 in your browser.
There are two components: ingestion and question-answering.
Ingestion has the following steps:
- Pull pdf from Federal Reserve
- Load pdf with LangChain's PyPDFLoader Loader
- Split documents with LangChain's TextSplitter
- Create a vectorstore of embeddings, using LangChain's vectorstore wrapper (with OpenAI's embeddings and FAISS vectorstore).
Question-Answering has the following steps, all handled by ChatVectorDBChain:
- Given the chat history and new user input, determine what a standalone question would be (using GPT-3).
- Given that standalone question, look up relevant documents from the vectorstore.
- Pass the standalone question and relevant documents to GPT-3 to generate a final answer.
- Implement LangChain + GPT-3.5 for Federal Speeches application
- Unstructured file access based on langchain
- [ ].md ?
- [x].pdf
- [ ].docx ?
- [ ].txt ?
- Add support for other LLM models
- Move a vectorstore (chromadb?) to a client/server mode (this offers better performance and scalability than the in-memory mode)
- Implement analysis in time
- create data structure using time as namespaces?
- show graphs
- allign with graphs