Skip to content
forked from argusdusty/Ferret

An optimized substring search engine written in Go

Notifications You must be signed in to change notification settings

yinqunjun/Ferret

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

43 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Ferret: An optimized substring search engine written in Go.
Makes use of a combination of an Inverted Index and a Suffix Array to allow log-time lookups with a relatively small memory footprint.
Also incorporates error-correction (Levenshtein distance 1) and simple Unicode-to-ASCII conversion.
Allows for arbitrary sorting functions
Allows you to map arbitrary data to your results, and quickly update this data.

License: CC BY-SA 3.0 (http://creativecommons.org/licenses/by-sa/3.0/)
You are free to share and remix this work, as long as you attribute the original author, and share it under a similar license
Author: Mark Canning

To Install: go get github.com/argusdusty/Ferret
To Update: go get -u github.com/argusdusty/Ferret
To Use: import "github.com/argusdusty/Ferret"
To try the examples: go build example.go && go build dictionaryexample.go

Check out example.go and dictionaryexample.go for example usage. The code is meant to be as fast as possible for a substring dictionary search, and as such is best suited for large dictionaries with >1,000,000 total characters. I've timed 10s initialization for 3.5 million characters (350,000 entries) on a modern CPU, and 15us search time (4000us with error-correction).

Uses linear memory (approximately 10 times the size of the dictionary)
Searches performed in log time with the number of characters in the dictionary.
Sorted searches can be slow, taking ~linear time with the number of matches, rather than linear time with the results limit.

About

An optimized substring search engine written in Go

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published