This Java library performs 'natural order' comparisons of strings in Java. Copyright (C) 2016 by Pierre-Luc Rigaux [email protected].
It is based on the work of :
- Pierre-Luc Paour https://github.com/paour/natorder
- David Koelle http://www.davekoelle.com/alphanum.html
This article by Jeff Atwood on Coding Horror explains the problematic of the need of a natural ordering library.
As humans we sort strings differently than computers. This library helps to sort string in natural order. The natural order is the way humans would sort elements contrary as the default computer sorting order which follow the unnatural ASCII order.
"Natural sort" is the widely-used term for sorting "image9.jpg" as being less than "image10.jpg". it is "natural" because it is how a human being would sort them, as opposed to the unnatural "ascii-betical" sorting that computers do by default. codinghorror.com/blog/archives/001018.html – Kip Aug 11 '09 at 19:28
I'd like some kind of string comparison function that preserves natural sort order1. Is there anything like this built into Java? I can't find anything in the String class, and the Comparator class only knows of two implementations.
The Alphanum Algorithm is an improved sorting algorithm for strings containing numbers. Instead of sorting numbers in ASCII order like a standard sort, this algorithm sorts numbers in numeric order.
ASCII Order | Natural Order |
---|---|
z101.doc | z101.doc |
z4.doc | z4.doc |
z102.doc | z102.doc |
z5.doc | z5.doc |
z1.doc | z1.doc |
z11.doc | z11.doc |
z6.doc | z6.doc |
z2.doc | z2.doc |
z12.doc | z12.doc |
- Negative number
- Space insensitive
- Switch between collator
The comparator compares numbers in the natural form order. Basically the order that we learned at school.
0 < 1 < 3 < 10 < 23 < 134 < 123454634563472345345653456347
Unlike some one liner scripts, the comparator doesn't transform the numeric string into a numeric value. This have 2 benefits:
- Allows numbers beyond the limits of the language native numeric type
- Increases calculation performance
The comparator takes case of strings representing negative numbers. Which means that it considers that "-10" is smaller than "-1".
It considers a negative number if an hyphen ('-') is ahead a digit and if the hyphen:
- is at the begin of the string OR
- there is a whitespace before the hyphen.
"-456", " -43", "test -467"
If those conditions aren't met, we can consider the hyphen as a mere separator.
"Something-45", "123-456"
The comparator takes also care of the decimal part of the number. Honestly, I don't know in real life when it can be useful, never the less for sake of consistency I decided to include this feature.
"0.3" < "0.33" < "3.456456345634563423467" < "10"
As you know the leading and tailing zeros have no numerical significance, but it may have an aesthetic one. Zero padding can also be used to sort numerical value in natural order. The comparator will ignore leading and tailing zeros if PRIMARY mode chosen.
The comparator has different ways to handle white spaces.
Note: for now the comparator only handle white spaces and NOT horizontal spaces the nonbraking space \u00A0.
The comparator doesn't take in account any white spaces in the string.
The comparator collapse series of continuous white spaces and treat then as one space.
The comparator can ignore white space at the beginning, at the end or at both end of the string.
The comparator can be adjusted to compare compare strings lexicographically, according to a specific locale.
The Collator class performs locale-sensitive String comparison. You use this class to build searching and sorting routines for natural language text.
The comparator can be adjusted to compare strings ignoring case difference.
Note that this method does not take locale into account, and will result in an unsatisfactory ordering for certain locale. The java.text package provides collators to allow locale-sensitive ordering.
In my specific case, I have software version strings that I want to sort. So I want "1.2.10.5" to be considered greater than "1.2.9.1".
1 By "natural" sort order, I mean it compares strings the way a human would compare them, as opposed to "ascii-betical" sort ordering that only makes sense to programmers. In other words, "image9.jpg" is less than "image10.jpg", and "album1set2page9photo1.jpg" is less than "album1set2page10photo5.jpg", and "1.2.9.1" is less than "1.2.10.5"
Sometimes they accept and skip whitespace, skip leading zeros and most importantly places shorter strings before longer strings when they are equivalent. The string 1.020 will then be placed after 1.20. If you're using it for determining if two versions are equal you can get a false negative in this case. I.e. when checking that compareTo() returns 0.
If you want to sort in the reverse order you just have to call the reverse() method. It leaverage the Java 8 utility method.
NaturalComparator naturalComparator = new NaturalComparator();
Comparator<String> comparator = naturalComparator.reverse();
Here list of summaries
Case | Description | Example |
---|---|---|
PRIMARY | Look only at the number numeric value. Treat leading and trailing zeros as non significant. |
"Doc 5.doc" = "Doc5.doc" "Doc 5.doc" = "Doc05.doc" "Doc 5.doc" < "Doc05.2.doc" |
SECONDARY | If the string are PRIMARY equal, the comparator looks white spaces and leading and trailing zeros around numbers to differentiate them. This a can be useful is you want to sort in a definitive order similar string. |
"Doc 5.doc" > "Doc5.doc" "Doc 5.doc" < "Doc05.doc" "Doc 5.doc" < "Doc05.2.doc"" |
LTRIM | Ignore leading white spaces at the beginning and the end of the string. |
"Doc5.doc" = " Doc5.doc" |
RTRIM | Ignore trailing white spaces at the beginning and the end of the string. | "Doc5.doc" = "Doc5.doc " |
TRIM | Ignore leading and trailing white spaces at the beginning and the end of the string.
Note: Combination of LTRIM and RTRIM |
"Doc5.doc" = " Doc5.doc " |
NEGATIVE_NUMBER | Treat the hyphen before the number as negative. |
"-5" < "-4" |
RATIONAL_NUMBER | Handle the portion after the dot "." as decimal. |
"10.4" < "10.45" |
REAL_NUMBER | Combination of the NEGATIVE_NUMBER and RATIONAL_NUMBER flag. |
"-10.4" > "-10.45" |
SPACE_INSENSITVE | Ignore white spaces in string. | "abc" = " a b c " " \n abc \n" = " a b c " |
SPACE_REPETITION_INSENSITVE | Ignore white spaces repetition in string. | "ab c" = "ab c" "abc" != "ab c" |
This software is provided 'as-is', without any express or implied warranty. In no event will the authors be held liable for any damages arising from the use of this software.
Permission is granted to anyone to use this software for any purpose, including commercial applications, and to alter it and redistribute it freely, subject to the following restrictions:
The origin of this software must not be misrepresented; you must not claim that you wrote the original software. If you use this software in a product, an acknowledgment in the product documentation would be appreciated but is not required. Altered source versions must be plainly marked as such, and must not be misrepresented as being the original software. This notice may not be removed or altered from any source distribution.