Skip to content

plrigaux/Comparators

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Natural Comparator

This Java library performs 'natural order' comparisons of strings in Java. Copyright (C) 2016 by Pierre-Luc Rigaux [email protected].

It is based on the work of :

This article by Jeff Atwood on Coding Horror explains the problematic of the need of a natural ordering library.

Description

As humans we sort strings diffrently than computers. This librairy helps to sort string in natural order. The natural order is the way humans would sort elements contrary as the default computer sorting order wich follow the "unnatural" ASCII order.

"Natural sort" is the widely-used term for sorting "image9.jpg" as being less than "image10.jpg". it is "natural" because it is how a human being would sort them, as opposed to the unnatural "ascii-betical" sorting that computers do by default. codinghorror.com/blog/archives/001018.html – Kip Aug 11 '09 at 19:28

I'd like some kind of string comparison function that preserves natural sort order1. Is there anything like this built into Java? I can't find anything in the String class, and the Comparator class only knows of two implementations.

The Alphanum Algorithm is an improved sorting algorithm for strings containing numbers. Instead of sorting numbers in ASCII order like a standard sort, this algorithm sorts numbers in numeric order.

ASCII OrderNatural Order
z101.doc
z4.doc
z102.doc
z5.doc
z1.doc
z11.doc
z6.doc
z2.doc
z12.doc
z101.doc
z4.doc
z102.doc
z5.doc
z1.doc
z11.doc
z6.doc
z2.doc
z12.doc
  1. Negative number
  2. Space insensitive
  3. Switch betwwen collator

Numbers

The compatator compares numbers in the natural form order. Basically the order that we learnt at school.


 0 < 1 < 3 < 10 < 23 < 134 < 123454634563472345345653456347
  

Unlike some one liner scripts, the comparator doesn't transform the numeric string into a numeric value. This have 2 benefits:

  • Allows numbers beyong the limits of the language native num type
  • Increases calculation performance

Negative numbers

The comparator takes case of strings representing negative numbers. Which means that it considers that "-10" is smaller than "-1".

It is consider a negative number if the string starts with an hyphen ('-') or there is a whitespace before the hyphen.

Negative: "-456", " -43", "test -467"

Not negative: "Something-45"

Decimal numbers

The comparator takes also care of the decimal part of the number. Honestly, I don't know in real life when it can be usefull never the less for sake of consistency I decide to include this feature.


 0 < 1 < 3 < 10 < 23 < 134 < 123454634563472345345653456347
 
 

Leading and tailing zeros

As you know the leading and tailing zeros have no numerical significance, but it may have an aestetic one. The comparator will ignore leading and tailing zeros if mode ASDF chosssen.

Space insensitive

The comparator doesn't take care of white spaces in front of numbers.

all leading and trailing whitespace of both the expectedString and the examined string are ignored any remaining whitespace, appearing within either string, is collapsed to a single space before comparison For example:

assertThat(" my\tfoo bar ", equalToIgnoringWhiteSpace(" my foo bar"))

Heading zeros incensitive

The comparator doesn't take care of zeros in front of numbers.

Custom comparator

Internationalisation I18n

The comparator can be ajusted to compare compare strings lexicographically, according to a specific locale.

The Collator class performs locale-sensitive String comparison. You use this class to build searching and sorting routines for natural language text.

Case insensitive

The comparator can be ajusted to compare strings ignoring case difference.

Note that this method does not take locale into account, and will result in an unsatisfactory ordering for certain locales. The java.text package provides collators to allow locale-sensitive ordering.

Lists table order

In my specific case, I have software version strings that I want to sort. So I want "1.2.10.5" to be considered greater than "1.2.9.1".

1 By "natural" sort order, I mean it compares strings the way a human would compare them, as opposed to "ascii-betical" sort ordering that only makes sense to programmers. In other words, "image9.jpg" is less than "image10.jpg", and "album1set2page9photo1.jpg" is less than "album1set2page10photo5.jpg", and "1.2.9.1" is less than "1.2.10.5"

Sometimes they accept and skip whitespace, skip leading zeros and most importantly places shorter strings before longer strings when they are equivalent. The string 1.020 will then be placed after 1.20. If you're using it for determining if two versions are equal you can get a false negative in this case. I.e. when checking that compareTo() returns 0.

https://github.com/paour/natorder

Reverse order

If you want to sort in the reverse order you just have to call the reverse() method. It leaverage the Java 8 utility method.

NaturalComparator naturalComparator = new NaturalComparator();

Comparator<String> comparator = naturalComparator.reverse();

Summary

Here list of summaries

Case Description Example
PRIMARY Look only at the number numeric value. Treat leading and trailing zeros as non significant. 'Doc 5.doc'= 'Doc5.doc'
'Doc 5.doc' = 'Doc05.doc'
'Doc 5.doc' < 'Doc05.2.doc'
SECONDARY If the string are PRIMARY equal, the comparator looks white spaces and leading and trailing zeros around numbers. 'Doc 5.doc' > 'Doc5.doc'
'Doc 5.doc' < 'Doc05.doc'
'Doc 5.doc' < 'Doc05.2.doc'
'Doc 5.doc' = 'Doc5.doc'
'Doc 5.doc' = 'Doc05.doc'"
'Doc 5.doc' < 'Doc05.2.doc'"
CaseDescriptionExample
PRIMARY Look only at the number numeric value. Treat leading and trailing zeros as non significant. 'Doc 5.doc' = 'Doc5.doc'
'Doc 5.doc' = 'Doc05.doc'
'Doc 5.doc' < 'Doc05.2.doc'
CaseDescriptionExample
PRIMARY Look only at the number numeric value. Treat leading and trailing zeros as non significant. 'Doc 5.doc' = 'Doc5.doc'
'Doc 5.doc' = 'Doc05.doc'
'Doc 5.doc' < 'Doc05.2.doc'
SECONDARY 'Doc 5.doc' = 'Doc5.doc'
'Doc 5.doc' = 'Doc05.doc'"
'Doc 5.doc' < 'Doc05.2.doc'"
LTRIM Ignore leading white spaces at the beginning and the end of the string.
RTRIM Ignore trailing white spaces at the beginning and the end of the string.
TRIM Ignore leading and trailing white spaces at the beginning and the end of the string.
Note: Combination of LTRIM and RTRIM
'   asdf'
'asdf  '
'  asdf    '
'asdf'
NEGATIVE_NUMBER Treat the hyphen before the number as negative
RATIONAL_NUMBER Handle the portion after the dot '.' as decimal.
REAL_NUMBER Combination of the NEGATIVE_NUMBER and RATIONAL_NUMBER flag.

Diclaimer

This software is provided 'as-is', without any express or implied warranty. In no event will the authors be held liable for any damages arising from the use of this software.

Permission is granted to anyone to use this software for any purpose, including commercial applications, and to alter it and redistribute it freely, subject to the following restrictions:

The origin of this software must not be misrepresented; you must not claim that you wrote the original software. If you use this software in a product, an acknowledgment in the product documentation would be appreciated but is not required. Altered source versions must be plainly marked as such, and must not be misrepresented as being the original software. This notice may not be removed or altered from any source distribution.

About

Sort string in numeric order instead of ASCII order

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published