GitHub - xukaijie111/uFuzzy at 7696d742091e2155d52ef7e974571aeada74d24c

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
demos		demos
dist		dist
src		src
.editorconfig		.editorconfig
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
jsconfig.json		jsconfig.json
package.json		package.json
rollup.config.js		rollup.config.js

Repository files navigation

▒ μFuzzy

A tiny, efficient, fuzzy search that doesn't suck. This is my fuzzy 🐈. There are many like it, but this one is mine.

Introduction

uFuzzy is a fuzzy search library designed to match a relatively short search phrase (needle) against a large list of short-to-medium phrases (haystack). It might be best described as a more forgiving String.prototype.indexOf(). Its performance leaves significant headroom for matching fuzzy terms out-of-order by combining matches from all permutations of the needle. When held just right, it can efficiently match against multiple properties, too. Common use cases are list filtering, auto-complete/suggest, and title/name/description/filename/function searches.

uFuzzy is intolerant of missing terms, and missing or out-of-order characters, so is a poor fit for applications like spellcheck or fulltext/document search.

Features

Junk-free, high quality results that are dataset-independent. No need to fine-tune indexing options or boosting params to attain some arbitrary quality score cut-off.
Straightforward fuzziness control that can be explained to your grandma in 5min.
Sorting you can reason about and customize using a simple Array.sort() which gets access to each match's stats/counters. There's no composite, black box "score" to understand.
Concise set of options that don't interact in mysterious ways to drastically alter combined behavior.
Fast with low resource usage - there's no index to build, so startup is below 1ms with near-zero memory overhead. Searching a three-term phrase in a 162,000 phrase dataset takes 11ms or 35ms with out-of-order terms.
Micro, with zero dependencies - currently < 3KB min

Demos

NOTE: The testdata.json file is a diverse 162,000 string/phrase dataset 4MB in size, so first load may be slow due to network transfer. Try refreshing once it's been cached by your browser.

First, uFuzzy in isolation to demonstrate its performance.

https://leeoniya.github.io/uFuzzy/demos/compare.html?libs=uFuzzy&search=super%20ma

Now the same comparison page, booted with fuzzysort, QuickScore, and Fuse.js:

https://leeoniya.github.io/uFuzzy/demos/compare.html?libs=uFuzzy,fuzzysort,QuickScore,Fuse&search=super%20ma

Here is the full library list but with a reduced dataset (just hearthstone_750, urls_and_titles_600) to avoid crashing your browser:

https://leeoniya.github.io/uFuzzy/demos/compare.html?lists=hearthstone_750,urls_and_titles_600&search=moo

Installation

Node

npm i ufuzzy

const uFuzzy = require('ufuzzy');

Browser

<script src="./dist/uFuzzy.iife.min.js"></script>

Usage

uFuzzy works in 3 phases:

Filter - This filters the full haystack with a fast RegExp compiled from your needle without doing any extra ops. It returns an array of matched indices in original order.
Info - This collects more detailed stats about the filtered matches, such as start offsets, fuzz level, prefix/suffix counters, etc. It also gathers substring match positions for range highlighting. To do all this it re-compiles the needle into two more-expensive RegExps that can partition each of the filtered matches. Therefore, it should be run on a reduced subset of the haystack, usually returned by the Filter phase. The uFuzzy demo is gated at <= 1,000 filtered items, before moving ahead with this Info phase.
Sort - This does an Array.sort() to determine final result order, probing the info object returned from the previous phase. A custom sort function can be provided via a uFuzzy option: {sort: (info, haystack, needle) => idxsOrder}.

let haystack = [
    'puzzle',
    'Super Awesome Thing (now with stuff!)',
    'FileName.js',
    '/feeding/the/catPic.jpg',
];

let opts = {};

let uf = new uFuzzy(opts);

let needle = 'feed cat';

// pre-filter
let idxs = uf.filter(haystack, needle);

// sort/rank only when <= 1,000 items
if (idxs.length <= 1e3) {
  let info = uf.info(idxs, haystack, needle);

  // order is a double-indirection array (a re-order of the passed-in idxs)
  // this allows corresponding info to be grabbed directly by idx, if needed
  let order = uf.sort(info, haystack, needle);

  // render filtered & ordered matches
  for (let i = 0; i < order.length; i++) {
    console.log(haystack[idxs[order[i]]]);
  }
}
else {
  // render filtered but unordered matches
  for (let i = 0; i < idxs.length; i++) {
    console.log(haystack[i]);
  }
}

A biased appraisal of similar work

Forget "apples and oranges"; comparing text search engines is more akin to "Cars vs Planes: A Toddler's Perspective". However, that doesnt mean we cannot gain some insight into a slice of operational behavior. This assessment is extremely narrow and, of course, biased towards my use cases, text corpus, and my complete expertise in operating my own library. It is highly probable that I'm not taking full advantage of some feature in other libraries that may significantly improve outcomes along some axis; I welcome improvement PRs from anyone with deeper library knowledge than afforded by my hasty 10min skim over any "Basic usage" example and README doc.

Search quality

Can-of-worms #1.

Before we discuss performance let's talk about search quality, because speed is irrelevant when your results are a strange medly of "Oh yeah!" and "WTF?".

Search quality is very subjective. What constitutes a good top match in a "typeahead/auto-suggest" case can be a poor match in a "search/find-all" scenario. Some solutions optimize for the latter, some for the former. It's common to find knobs that skew the results in either direction, but these are often by-feel and imperfect, being little more than a proxy to producing a single, composite match "score".

Let's take a look at some matches produced by the most popular fuzzy search library, Fuse.js and a some others for which match highlighting is implemented in the demo.

TODO...

Performance

Can-of-worms #2.

I've tried to follow any "best performance" advice when I could find it in each library's docs, but it's a certainty that some stones were left unturned when implementing ~20 different search engines.

The task:

Given a diverse list of 162,000 words and phrases, assume a Latin/Western European charset (can skip any diacritics/accents normalization)
Do a case-insensitive, partial/fuzzy match of the search string "super ma"
Sort the results in the most sensible way, following the Principle of least astonishment
Optionally highlight the matched substrings in each result
Bonus points for matches with out-of-order terms
Do it with the fewest resources (CPU and RAM)

Lib	Stars	Size (min)	Init	Search	Heap (peak)	Retained
uFuzzy (try)	★ 0	2.5KB	0.3ms	11.7ms	7.7MB	7.5MB
Fuse.js (try)	★ 14.8k	23.5KB	34ms	600ms	20.6MB	14.2MB
FlexSearch (Light) (try)	★ 8.9k	5.9KB	3500ms	8.1ms	322MB	323MB
Lunr.js (try)	★ 8.2k	29.4KB	1700ms	8.7ms	128MB	127MB
LyraSearch (try)	★ 3.3k
match-sorter (try)	★ 3.1k	7.3KB	0.03ms	125ms	13.1MB	12.9MB
fuzzysort (try)	★ 3k	5.5KB	50ms	12ms	54.7MB	54.4MB
fuzzysearch (try)	★ 2.6k
Elasticlunr.js (try)	★ 1.9k	18.1KB	1000ms	1.5ms	73.6MB	73.4MB
MiniSearch (try)	★ 1.5k	22.4KB	575ms	1ms	70.2MB	70MB
Fuzzyset (try)	★ 1.3k	2.8KB	2900ms	31ms	246MB	244MB
search-index (try)	★ 1.3k
LiquidMetal (try)	★ 285
ItemJS (try)	★ 260
FuzzySearch (try)	★ 184
FuzzySearch2 (try)	★ 173
QuickScore (try)	★ 131	9.1KB	26ms	155ms	26.1MB	18.7MB
fzy (try)	★ 115
fuzzyMatch (try)	★ 0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

▒ μFuzzy

Introduction

Features

Demos

Installation

Node

Browser

Usage

A biased appraisal of similar work

Search quality

Performance

About

Releases

Packages

Languages

License

xukaijie111/uFuzzy

Folders and files

Latest commit

History

Repository files navigation

▒ μFuzzy

Introduction

Features

Demos

Installation

Node

Browser

Usage

A biased appraisal of similar work

Search quality

Performance

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages