- Python == 3.8, numpy == 1.20.3, faiss
The tested datasets are available at https://www.cse.cuhk.edu.hk/systems/hash/gqr/datasets.html.
-
Download the dataset GIST from ftp://ftp.irisa.fr/local/texmex/corpus/gist.tar.gz. The dataset is large. This step may take several minutes. The data format can be found in http://corpus-texmex.irisa.fr/.
wget -O ./gist.tar.gz ftp://ftp.irisa.fr/local/texmex/corpus/gist.tar.gz --no-check-certificate
-
Unzip the dataset.
tar -zxvf ./gist.tar.gz -C ./
-
Preprocess the dataset with random orthogonal transformation.
python randomize.py
-
Generate the clustering of the dataset for IVF.
python ivf.py