Approximate Vector Set Search: A Bio-Inspired Approach for High-Dimensional Spaces

Overview

This project, "Approximate Vector Set Search: A Bio-Inspired Approach for High-Dimensional Spaces," presents a novel method inspired by biological processes to address the challenges of searching in high-dimensional vector spaces. The approach leverages advanced computational techniques to improve the efficiency and accuracy of approximate vector set searches, a critical task in various domains such as bioinformatics, machine learning, and data mining.

CS vector sample dataset (Link, Access Code: Wzsb). Additional data available from public sources.

Repository Structure

The repository is organized into several key directories, each containing specific components of the project:

main.py and utils.py: Core scripts for running the main algorithm and utility functions.
ComparativeExperiment: Contains scripts and modules for comparative experiments to evaluate the proposed method against existing approaches.
_NaiveBioVSS: Includes a naive implementation of the BioVSS algorithm and related scripts.
C++code: Houses C++ implementations and CUDA code for performance-critical components.
data: Contains benchmark datasets used for testing and evaluation.
logs: Logging utilities for tracking experiments and results.
src: Core source code including data loading, filtering, and refinement modules.

Key Components

Main Algorithm

main.py: Entry point for executing the approximate vector set search algorithm.
utils.py: Utility functions supporting the main algorithm.

Comparative Experiments

ComparativeExperiment/CompareBase.py: Base class for comparative experiments.
ComparativeExperiment/main_c.py: Script to execute comparative experiments.
ComparativeExperiment/CompareMethod: Directory containing various comparison methods such as BruceExactHausdorff, IndexHNSWMean, IndexIVFFlatMean, etc.
ComparativeExperiment/DataLoader: Data loading utilities for comparative experiments.
ComparativeExperiment/logs/Logger.py: Logging utilities for experiments.

Naive Implementation

_NaiveBioVSS/hausdorff_distance_naive_lsh.cpp: C++ implementation of naive Hausdorff distance calculation using LSH.
_NaiveBioVSS/main.py: Main script for running the naive implementation.
_NaiveBioVSS/setup.py: Setup script for the naive implementation.
_NaiveBioVSS/test.py: Testing script for the naive implementation.

C++ and CUDA Code

C++code/cudaDenseHausdorff: Directory containing CUDA implementations for dense Hausdorff distance calculations.
C++code/cudaDenseHausdorff/haudorff: Subdirectory for basic Hausdorff distance calculations.
C++code/cudaDenseHausdorff/haudorff_lsh: Subdirectory for LSH-based Hausdorff distance calculations.
C++code/cudaDenseHausdorff/src: Source code for additional implementations.

Data and Logging

data/benchmark/CS/test: Test datasets for benchmarking.
logs/Logger.py: Logging utilities for the main algorithm.

Source Code

src/Pipeline.py: Pipeline for executing the vector set search.
src/DataLoader: Data loading modules for the main algorithm.
src/Filter: Filtering modules including BloomGraph, IndexCount, and OverlapVector.
src/Refinement: Refinement modules including ExactHausdorff and ParallelExactHausdorff.

Getting Started

Prerequisites

Python 3.10
Required Python libraries (specified in requirements.txt)

Installation

First download all the data and compile the C++ code. second, complete all data preparation using BioHash.

Clone the repository:

git clone https://github.com/CS-BruceChen/biovss.git

Navigate to the project directory:
```
cd BioVSS
```

Running the Algorithm

To execute the main algorithm, run:

python main.py

Experimental Evaluation

Detailed instructions for running comparative experiments can be found in the ComparativeExperiment directory.

Contributing

Contributions are welcome. Please submit a pull request or contact the authors for major changes.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Citation

If you use our code for research work, please cite our paper as below:


@inproceedings{li2025approximate,

  title={Approximate Vector Set Search: A Bio-Inspired Approach for High-Dimensional Spaces},

  author={Li, Yiqi and Wang, Sheng and Chen, Zhiyu and Chen, Shangfeng and Peng, Zhiyong},

  booktitle={Proceedings of the 41th International Conference on Data Engineering},

  year={2025},

  publisher={IEEE},

  organization={IEEE}

}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Approximate Vector Set Search: A Bio-Inspired Approach for High-Dimensional Spaces

Overview

Repository Structure

Key Components

Main Algorithm

Comparative Experiments

Naive Implementation

C++ and CUDA Code

Data and Logging

Source Code

Getting Started

Prerequisites

Installation

Running the Algorithm

Experimental Evaluation

Contributing

License

Citation

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.idea		.idea
C++code/cudaDenseHausdorff		C++code/cudaDenseHausdorff
ComparativeExperiment		ComparativeExperiment
_NaiveBioVSS		_NaiveBioVSS
data/benchmark/CS		data/benchmark/CS
logs		logs
src		src
README.md		README.md
hausdorff_change_to_meanmin.zip		hausdorff_change_to_meanmin.zip
main.py		main.py
utils.py		utils.py

whu-totemdb/biovss

Folders and files

Latest commit

History

Repository files navigation

Approximate Vector Set Search: A Bio-Inspired Approach for High-Dimensional Spaces

Overview

Repository Structure

Key Components

Main Algorithm

Comparative Experiments

Naive Implementation

C++ and CUDA Code

Data and Logging

Source Code

Getting Started

Prerequisites

Installation

Running the Algorithm

Experimental Evaluation

Contributing

License

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages