Document embedding and semantic search retrieval

This is a simple document embedding project which demonstrates how to utilise a Huggingface pre-trained model and use it to retrieve the documents back. As of writing this software, BAAI's (Zhiyuan Institute) BGE is a very good text embedding tool.

To run this project, you have to first install the requirements. I highly recommend creating a new virtual environment.

To create a new virtual environment:

python -m venv my_virtual_env

Activate the environment

source ./my_virtual_env/bin/activate

Install the requirements:

pip install -r requirements.txt

I am using Python 3.10. Please use this for maximum compatibility.

To add documents to the vectorDB:

python make_knowledge_DB.py --dir "<directory with pdf books>"

To enquire or retrieve information from the database:

python get_data_from_database.py

Ask questions to retrieve documents semantically related to your question. Or enter q to quit.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
local_embedder		local_embedder
.gitattributes		.gitattributes
.gitignore		.gitignore
Readme.md		Readme.md
config.py		config.py
get_data_from_database.py		get_data_from_database.py
make_knowledge_DB.py		make_knowledge_DB.py
requirementx.txt		requirementx.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Document embedding and semantic search retrieval

About

Releases

Packages

Languages

XccelerateOrg/VectorDB_Storage_Retrieval

Folders and files

Latest commit

History

Repository files navigation

Document embedding and semantic search retrieval

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages