Original, standard and customisable versions of the Jaro-Winkler functions.
>>> import jaro >>> jaro.jaro_winkler_metric(u'SHACKLEFORD', u'SHACKELFORD') 0.9818181 >>> help(jaro) Help on package jaro: NAME jaro - Python translation of the original Jaro-Winkler functions. DESCRIPTION The Jaro-Winkler functions compare two strings and return a score indicating how closely the strings match. The score ranges from 0 (no match) to 1 (perfect match). Two null strings ('') will compare as equal. Strings should be unicode strings, and will be compared as given; the caller is responsible for capitalisations and trimming leading/trailing spaces. You should normally only need to use either the jaro_metric() or jaro_winkler_metric() functions defined here. If you want to implement your own, non-standard metrics, look at the comments and functions in the jaro.py submodule. PACKAGE CONTENTS ... jaro strcmp95 ... FUNCTIONS jaro_metric(string1, string2) The standard, basic Jaro string metric. jaro_winkler_metric(string1, string2) The Jaro metric adjusted with Winkler's modification, which boosts the metric for strings whose prefixes match. original_metric(string1, string2) The same metric that would be returned from the reference Jaro-Winkler C code, taking as it does into account a typo table and adjustments for longer strings. ... custom_metric(string1, string2, typo_table, typo_scale, boost_threshold, pre_len, pre_scale, longer_prob) Calculate the Jaro-Winkler metric with parameters of your own choosing. ...