You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi All,
I'm working on rewriting a batch process that is quite old.
It processes multiple files which have essentially the same data, but one has an additional text field at the end.
The files are comma separated value (CSV) files.
Without going too far into it, I was under the impression the PatternMatchingCompositeLineMapper would support a full regular expression suite. After all, Java does have a good pattern matching library in the Pattern & Matches and the String classes "matches" functionality which is built over the top of those regular expression tools.
What I have discovered though is that the support for regular expressions is rather limited with '*' stars and '?' question marks.
From reviewing the code it looks like it's a very limited, ant pattern matching capability.
The result is that a solution is very inelegant, requiring a long list of ??? and intermittent * to support possibly unknown length white space values.
Granted, I could write a custom line tokenizer but according to "The Definitive Guide to Spring Batch", that's expanding the separation of concerns for that object and not recommended. My understanding is that the author of that book is also head of the Spring Batch project.
Any chance someone would be willing to implement java.util.Pattern matching functionality?
/** * Lifted from AntPathMatcher in Spring Core. Tests whether or not a string * matches against a pattern. The pattern may contain two special * characters:<br> * '*' means zero or more characters<br> * '?' means one and only one character * * @param pattern pattern to match against. Must not be <code>null</code>. * @param str string which must be matched against the pattern. Must not be * <code>null</code>. * @return <code>true</code> if the string matches against the pattern, or * <code>false</code> otherwise. */publicstaticbooleanmatch(Stringpattern, Stringstr) {
intpatIdxStart = 0;
intpatIdxEnd = pattern.length() - 1;
intstrIdxStart = 0;
intstrIdxEnd = str.length() - 1;
charch;
booleancontainsStar = pattern.contains("*");
if (!containsStar) {
// No '*'s, so we make a shortcutif (patIdxEnd != strIdxEnd) {
returnfalse; // Pattern and string do not have the same size
}
for (inti = 0; i <= patIdxEnd; i++) {
ch = pattern.charAt(i);
if (ch != '?') {
if (ch != str.charAt(i)) {
returnfalse;// Character mismatch
}
}
}
returntrue; // String matches against pattern
}
if (patIdxEnd == 0) {
returntrue; // Pattern contains only '*', which matches anything
}
// Process characters before first starwhile ((ch = pattern.charAt(patIdxStart)) != '*' && strIdxStart <= strIdxEnd) {
if (ch != '?') {
if (ch != str.charAt(strIdxStart)) {
returnfalse;// Character mismatch
}
}
patIdxStart++;
strIdxStart++;
}
if (strIdxStart > strIdxEnd) {
// All characters in the string are used. Check if only '*'s are// left in the pattern. If so, we succeeded. Otherwise failure.for (inti = patIdxStart; i <= patIdxEnd; i++) {
if (pattern.charAt(i) != '*') {
returnfalse;
}
}
returntrue;
}
// Process characters after last starwhile ((ch = pattern.charAt(patIdxEnd)) != '*' && strIdxStart <= strIdxEnd) {
if (ch != '?') {
if (ch != str.charAt(strIdxEnd)) {
returnfalse;// Character mismatch
}
}
patIdxEnd--;
strIdxEnd--;
}
if (strIdxStart > strIdxEnd) {
// All characters in the string are used. Check if only '*'s are// left in the pattern. If so, we succeeded. Otherwise failure.for (inti = patIdxStart; i <= patIdxEnd; i++) {
if (pattern.charAt(i) != '*') {
returnfalse;
}
}
returntrue;
}
// process pattern between stars. padIdxStart and patIdxEnd point// always to a '*'.while (patIdxStart != patIdxEnd && strIdxStart <= strIdxEnd) {
intpatIdxTmp = -1;
for (inti = patIdxStart + 1; i <= patIdxEnd; i++) {
if (pattern.charAt(i) == '*') {
patIdxTmp = i;
break;
}
}
if (patIdxTmp == patIdxStart + 1) {
// Two stars next to each other, skip the first one.patIdxStart++;
continue;
}
// Find the pattern between padIdxStart & padIdxTmp in str between// strIdxStart & strIdxEndintpatLength = (patIdxTmp - patIdxStart - 1);
intstrLength = (strIdxEnd - strIdxStart + 1);
intfoundIdx = -1;
strLoop: for (inti = 0; i <= strLength - patLength; i++) {
for (intj = 0; j < patLength; j++) {
ch = pattern.charAt(patIdxStart + j + 1);
if (ch != '?') {
if (ch != str.charAt(strIdxStart + i + j)) {
continuestrLoop;
}
}
}
foundIdx = strIdxStart + i;
break;
}
if (foundIdx == -1) {
returnfalse;
}
patIdxStart = patIdxTmp;
strIdxStart = foundIdx + patLength;
}
// All characters in the string are used. Check if only '*'s are left// in the pattern. If so, we succeeded. Otherwise failure.for (inti = patIdxStart; i <= patIdxEnd; i++) {
if (pattern.charAt(i) != '*') {
returnfalse;
}
}
returntrue;
}
/** * Proposed but possibly oversimplified match functionality * @param regex * @param pattern * @return */publicstaticbooleanmatchUsingFullRegex(finalStringregex, finalStringpattern) {
if (regex == null)
thrownewNullPointerException("Regulat expression {" + regex + "} cannot be null");
if (pattern == null)
thrownewNullPointerException("Pattern {" + pattern + "} cannot be null");
returnPattern.matches(regex, pattern);
}
The text was updated successfully, but these errors were encountered:
Discussed in #4344
Originally posted by jmresler April 8, 2023
Hi All,
I'm working on rewriting a batch process that is quite old.
It processes multiple files which have essentially the same data, but one has an additional text field at the end.
The files are comma separated value (CSV) files.
Without going too far into it, I was under the impression the PatternMatchingCompositeLineMapper would support a full regular expression suite. After all, Java does have a good pattern matching library in the Pattern & Matches and the String classes "matches" functionality which is built over the top of those regular expression tools.
What I have discovered though is that the support for regular expressions is rather limited with '*' stars and '?' question marks.
From reviewing the code it looks like it's a very limited, ant pattern matching capability.
The result is that a solution is very inelegant, requiring a long list of ??? and intermittent * to support possibly unknown length white space values.
Granted, I could write a custom line tokenizer but according to "The Definitive Guide to Spring Batch", that's expanding the separation of concerns for that object and not recommended. My understanding is that the author of that book is also head of the Spring Batch project.
Any chance someone would be willing to implement java.util.Pattern matching functionality?
The text was updated successfully, but these errors were encountered: