This Java library performs 'natural order' comparisons of strings in Java. Copyright (C) 2016 by Pierre-Luc Rigaux [email protected].
It is based on the work of :
- Pierre-Luc Paour https://github.com/paour/natorder
- David Koelle http://www.davekoelle.com/alphanum.html
This article by Jeff Atwood on Coding Horror explains the problematic of the need of a natural ordering library.
As humans we sort strings diffrently than computers. This librairy helps to sort string in natural order. The natural order is the way humans would sort elements contrary as the default computer sorting order wich follow the "unnatural" ASCII order.
"Natural sort" is the widely-used term for sorting "image9.jpg" as being less than "image10.jpg". it is "natural" because it is how a human being would sort them, as opposed to the unnatural "ascii-betical" sorting that computers do by default. codinghorror.com/blog/archives/001018.html – Kip Aug 11 '09 at 19:28
I'd like some kind of string comparison function that preserves natural sort order1. Is there anything like this built into Java? I can't find anything in the String class, and the Comparator class only knows of two implementations.
The Alphanum Algorithm is an improved sorting algorithm for strings containing numbers. Instead of sorting numbers in ASCII order like a standard sort, this algorithm sorts numbers in numeric order.
ASCII Order | Natural Order |
---|---|
z101.doc z4.doc z102.doc z5.doc z1.doc z11.doc z6.doc z2.doc z12.doc |
z101.doc z4.doc z102.doc z5.doc z1.doc z11.doc z6.doc z2.doc z12.doc |
- Negative number
- Space insensitive
- Switch betwwen collator
The compatator compares numbers in the natural form order. Basically the order that we learnt at school.
0 < 1 < 3 < 10 < 23 < 134 < 123454634563472345345653456347
Unlike some one liner scripts, the comparator doesn't transform the numeric string into a numeric value. This have 2 benefits:
- Allows numbers beyong the limits of the language native num type
- Increases calculation performance
The comparator takes case of strings representing negative numbers. Which means that it considers that "-10" is smaller than "-1".
It is consider a negative number if the string starts with an hyphen ('-') or there is a whitespace before the hyphen.
Negative: "-456", " -43", "test -467"
Not negative: "Something-45"
The comparator takes also care of the decimal part of the number. Honestly, I don't know in real life when it can be usefull never the less for sake of consistency I decide to include this feature.
0 < 1 < 3 < 10 < 23 < 134 < 123454634563472345345653456347
As you know the leading and tailing zeros have no numerical significance, but it may have an aestetic one. The comparator will ignore leading and tailing zeros if mode ASDF chosssen.
The comparator doesn't take care of white spaces in front of numbers.
all leading and trailing whitespace of both the expectedString and the examined string are ignored any remaining whitespace, appearing within either string, is collapsed to a single space before comparison For example:
assertThat(" my\tfoo bar ", equalToIgnoringWhiteSpace(" my foo bar"))
The comparator doesn't take care of zeros in front of numbers.
The comparator can be ajusted to compare compare strings lexicographically, according to a specific locale.
The Collator class performs locale-sensitive String comparison. You use this class to build searching and sorting routines for natural language text.
The comparator can be ajusted to compare strings ignoring case difference.
Note that this method does not take locale into account, and will result in an unsatisfactory ordering for certain locales. The java.text package provides collators to allow locale-sensitive ordering.
In my specific case, I have software version strings that I want to sort. So I want "1.2.10.5" to be considered greater than "1.2.9.1".
1 By "natural" sort order, I mean it compares strings the way a human would compare them, as opposed to "ascii-betical" sort ordering that only makes sense to programmers. In other words, "image9.jpg" is less than "image10.jpg", and "album1set2page9photo1.jpg" is less than "album1set2page10photo5.jpg", and "1.2.9.1" is less than "1.2.10.5"
Sometimes they accept and skip whitespace, skip leading zeros and most importantly places shorter strings before longer strings when they are equivalent. The string 1.020 will then be placed after 1.20. If you're using it for determining if two versions are equal you can get a false negative in this case. I.e. when checking that compareTo() returns 0.
https://github.com/paour/natorder
If you want to sort in the reverse order you just have to call the reverse() method. It leaverage the Java 8 utility method.
NaturalComparator naturalComparator = new NaturalComparator();
Comparator<String> comparator = naturalComparator.reverse();
Here list of summaries
Case | Description | Example |
---|---|---|
PRIMARY | Look only at the number numeric value. Treat leading and trailing zeros as non significant. | 'Doc 5.doc'= 'Doc5.doc' 'Doc 5.doc' = 'Doc05.doc' 'Doc 5.doc' < 'Doc05.2.doc' |
SECONDARY | If the string are PRIMARY equal, the comparator looks white spaces and leading and trailing zeros around numbers. | 'Doc 5.doc' > 'Doc5.doc' 'Doc 5.doc' < 'Doc05.doc' 'Doc 5.doc' < 'Doc05.2.doc' |
'Doc 5.doc' = 'Doc05.doc'"
'Doc 5.doc' < 'Doc05.2.doc'"
Case | Description | Example | |||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
PRIMARY | Look only at the number numeric value. Treat leading and trailing zeros as non significant. |
'Doc 5.doc' = 'Doc5.doc' 'Doc 5.doc' = 'Doc05.doc' 'Doc 5.doc' < 'Doc05.2.doc'
This software is provided 'as-is', without any express or implied warranty. In no event will the authors be held liable for any damages arising from the use of this software. Permission is granted to anyone to use this software for any purpose, including commercial applications, and to alter it and redistribute it freely, subject to the following restrictions: The origin of this software must not be misrepresented; you must not claim that you wrote the original software. If you use this software in a product, an acknowledgment in the product documentation would be appreciated but is not required. Altered source versions must be plainly marked as such, and must not be misrepresented as being the original software. This notice may not be removed or altered from any source distribution. |