Skip to content

argusdusty/Ferret

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

33 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Ferret: An optimized substring search engine written in Go.
Makes use of a combination of an Inverted Index and a Suffix Array to allow log-time lookups with a relatively small memory footprint.
Also incorporates error-correction (Levenshtein distance 1) and simple Unicode-to-ASCII conversion.
Allows for arbitrary sorting functions

License: CC BY-SA 3.0 (http://creativecommons.org/licenses/by-sa/3.0/)
You are free to share and remix this work, as long as you attribute the original author, and share it under a similar license
Author: Mark Canning

To Install: go get github.com/argusdusty/Ferret
To Update: go get -u github.com/argusdusty/Ferret
To Use: import "github.com/argusdusty/Ferret"
To try the examples: go build example.go && go build dictionaryexample.go

Check out example.go and dictionaryexample.go for example usage. The code is meant to be as fast as possible for a substring dictionary search, and as such is best suited for large dictionaries with >1,000,000 total characters. I've timed 20s initialization for 3.5 million characters (350,000 entries) on a modern CPU, and 200us search time (3ms with error-correction).

Uses linear memory (approximately 2-3 times the size of the dictionary)
Searches performed in log time with the number of characters in the dictionary.
Sorted searches can be slow, taking ~linear time with the number of matches, rather than linear time with the results limit.
SortedInvertedSuffixes are meant to speed this up by doing some presorting, grouping possible results into categories by a predefined score, then searching the 'top' categories first.

About

An optimized substring search engine written in Go

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages