Skip to content

A tutorial on locality sensitive hashing, using MinHashing for document similarity and CosineSimilarity for Euclidean space similarity.

Notifications You must be signed in to change notification settings

xunge/Locality-sensitive-hashing-tutorial

 
 

Repository files navigation

Locality Sensitive Hashing Tutorial

As the name suggests, this is a tutorial on locality sensitive hashing. All of the information is contained in the notebook.

The sampledocs folder contains some artificial data for performing the document similarity task. It consists of news articles pulled from cnn, with one document consisting of partial concatenations of the others. This is to create artificilly similar documents, which our algorithms are trying to find.

The similarity task for vectors can easily generate synthetic data by just creating random matrices, so we do that in the notebook.

About

A tutorial on locality sensitive hashing, using MinHashing for document similarity and CosineSimilarity for Euclidean space similarity.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 93.7%
  • Python 6.3%