Skip to content

XccelerateOrg/VectorDB_Storage_Retrieval

Repository files navigation

Document embedding and semantic search retrieval

This is a simple document embedding project which demonstrates how to utilise a Huggingface pre-trained model and use it to retrieve the documents back. As of writing this software, BAAI's (Zhiyuan Institute) BGE is a very good text embedding tool.

To run this project, you have to first install the requirements. I highly recommend creating a new virtual environment.

To create a new virtual environment:

python -m venv my_virtual_env

Activate the environment

source ./my_virtual_env/bin/activate

Install the requirements:

pip install -r requirements.txt

I am using Python 3.10. Please use this for maximum compatibility.

To add documents to the vectorDB:

python make_knowledge_DB.py --dir "<directory with pdf books>" 

To enquire or retrieve information from the database:

python get_data_from_database.py

Ask questions to retrieve documents semantically related to your question. Or enter q to quit.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages