Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUGFIX (definitive) for regex search matching deleted text #12

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion scintilla/Scintilla.vcxproj
Original file line number Diff line number Diff line change
Expand Up @@ -102,7 +102,7 @@
<MultiProcessorCompilation>true</MultiProcessorCompilation>
<Optimization>Disabled</Optimization>
<PrecompiledHeader>NotUsing</PrecompiledHeader>
<PreprocessorDefinitions>X_SCINTILLA_RANGEGAP_FIX;JRB_BUILD;SCI_NAMESPACE;NO_CXX11_REGEX;SCI_OWNREGEX;ONIG_EXTERN=extern;_WIN64;_DEBUG;_WINDOWS;_CRT_SECURE_NO_DEPRECATE;_SCL_SECURE_NO_WARNINGS;STATIC_BUILD;SCI_LEXER;USE_D2D;%(PreprocessorDefinitions)</PreprocessorDefinitions>
<PreprocessorDefinitions>JRB_BUILD;SCI_NAMESPACE;NO_CXX11_REGEX;SCI_OWNREGEX;ONIG_EXTERN=extern;_WIN64;_DEBUG;_WINDOWS;_CRT_SECURE_NO_DEPRECATE;_SCL_SECURE_NO_WARNINGS;STATIC_BUILD;SCI_LEXER;USE_D2D;%(PreprocessorDefinitions)</PreprocessorDefinitions>
<RuntimeLibrary>MultiThreadedDebug</RuntimeLibrary>
<WarningLevel>Level3</WarningLevel>
</ClCompile>
Expand Down
6 changes: 0 additions & 6 deletions scintilla/src/CellBuffer.cxx
Original file line number Diff line number Diff line change
Expand Up @@ -598,12 +598,6 @@ const char *CellBuffer::RangePointer(Sci::Position position, Sci::Position range
return substance.RangePointer(position, rangeLength);
}

#ifdef X_SCINTILLA_RANGEGAP_FIX
const char *CellBuffer::DataPointer(Sci::Position position, Sci::Position rangeLength) {
return substance.DataPointer(position, rangeLength);
}

#endif
Sci::Position CellBuffer::GapPosition() const noexcept {
return substance.GapPosition();
}
Expand Down
3 changes: 0 additions & 3 deletions scintilla/src/CellBuffer.h
Original file line number Diff line number Diff line change
Expand Up @@ -150,9 +150,6 @@ class CellBuffer {
void GetStyleRange(unsigned char *buffer, Sci::Position position, Sci::Position lengthRetrieve) const;
const char *BufferPointer();
const char *RangePointer(Sci::Position position, Sci::Position rangeLength);
#ifdef X_SCINTILLA_RANGEGAP_FIX
const char *DataPointer(Sci::Position position, Sci::Position rangeLength);
#endif
Sci::Position GapPosition() const noexcept;

Sci::Position Length() const noexcept;
Expand Down
4 changes: 0 additions & 4 deletions scintilla/src/Document.h
Original file line number Diff line number Diff line change
Expand Up @@ -377,11 +377,7 @@ class Document : PerLine, public IDocumentWithLineEnd, public ILoader {
const char * SCI_METHOD BufferPointer() override { return cb.BufferPointer(); }
std::string SCI_METHOD GetRegexLastError() { return regex->GetErrorInfo(); }

#ifdef X_SCINTILLA_RANGEGAP_FIX
const char *RangePointer(Sci::Position position, Sci::Position rangeLength) { return cb.DataPointer(position, rangeLength); }
#else // !X_SCINTILLA_RANGEGAP_FIX
const char *RangePointer(Sci::Position position, Sci::Position rangeLength) { return cb.RangePointer(position, rangeLength); }
#endif
Sci::Position GapPosition() const { return cb.GapPosition(); }

int SCI_METHOD GetLineIndentation(Sci_Position line) override;
Expand Down
35 changes: 0 additions & 35 deletions scintilla/src/SplitVector.h
Original file line number Diff line number Diff line change
Expand Up @@ -321,41 +321,6 @@ class SplitVector {
}
}

#ifdef X_SCINTILLA_RANGEGAP_FIX
/// Return a pointer to a range of elements *excluding* the gap, while still
/// rearranging the buffer to make that range contiguous.
/// If the pointer is to end up at or past the end of the data (i.e. within
/// the gap once it's been pushed after all elements), the position returned
/// will be the end of the data block, never past it.
T *DataPointer(ptrdiff_t position, ptrdiff_t rangeLength) noexcept {
if (position >= part1Length) {
// Furthest position is at the end of the contiguous
// data block
return body.data() + part1Length;
} else { // position < part1Length
// Check if we have a contiguous data block of
// covering the whole range
// (i.e. from [position .. position + rangeLength] )
if ((position + rangeLength) > part1Length) {
// Range overlaps gap, so move gap to *end* of range.
// (i.e. moving the gap past position + rangeLength)
GapTo(position + rangeLength);
// Now, part1Length might have changed;
// check if we still go over
if (position > part1Length) {
// Return furthest position possible
return body.data() + part1Length;
} else {
// We're good now
return body.data() + position;
}
} else {
return body.data() + position;
}
}
}

#endif
/// Return the position of the gap within the buffer.
ptrdiff_t GapPosition() const noexcept {
return part1Length;
Expand Down
51 changes: 46 additions & 5 deletions scionigmo/OnigmoRegExEngine.cxx
Original file line number Diff line number Diff line change
Expand Up @@ -359,20 +359,61 @@ Sci::Position OnigmoRegExEngine::FindText(Document* doc, Sci::Position minPos, S

// --- search document range for pattern match ---
// !!! Performance issue: Scintilla: moving Gap needs memcopy - high costs for find/replace in large document

// 2019-08-12: Modified Scintilla's code to use something other than
// RangePointer(), which is optimized for insertion of text
// and thus has a habit of giving out pointers passed
// deleted text.
//
// TODO: Report bug at least to the Notepad3 project
// [2024-08-16: Notepad3 fixed it (unintentionally?) thru their commit 66c1209]
//
// Steps to reproduce bug:
// 1) Paste some predefined text in a Scntilla's edit window.
// 2) Devise a regular expression that matches the whole text, using
// 1) Devise a regular expression to match an end of line, using
// at least the $ anchor and making sure the regex doesn't
// end with any of the quantifiers (?, *, + or {m,n}) but instead
// a fixed length section, e.g. [..](?:-(?:20|19)[0-9][0-9]))
// 3) Match the regex pattern
auto const docBegPtr = UCharCPtr(doc->RangePointer(0, docLen));
// a fixed length section.
// e.g. (?:-(?:20|19)[0-9][0-9])$
// 2) Build the subject string that is expected to match pattern
// devised in step 1), and then paste it into Scintilla's edit
// window _after_ having it typed & copied from *another* text
// editor. If caret ends on the same line of text, type a line
// return manually.
// IMPORTANT: Text in step 2) MUST have been pasted, and in one go;
// typing the text directly within Scintilla's edit window
// instead (char-by-char) will FAIL to reproduce the bug.
// If needed, go back to step 2), copy the whole text,
// create a new document discarding the current one,
// and paste back the copied text.
// e.g.: <start-of-doc>16-08-2024<CR><LF>
// 3) Match the regex pattern.
// 4) Delete part of the pasted text from its end, e.g. by a few
// characters *including* the final line return, such that the
// pattern should no longer match.
// 5) Try matching the regex pattern again.
// If it STILL matches: the regex engine is given access to a version of
// the buffer still containing deleted text/chars,
// => bug successfully reproduced
// If it DOES NOT match: the regex engine is given access to an up-to-date
// version of the document buffer.
// => bug absent/fixed
// If bug was reproduced:
// 6) Delete the first char on the text's line and try matching the
// regex pattern again. It should no longer match, indicating that
// the document's buffer has collapsed into a single memory block.
//
// 2024-08-16: Revisited bug, so as to support updating Scintilla's
// code base by undoing modifications by the original fix.
// **BTW: The original bugfix was left out of all Release builds
// due to its dedicated preprocessor define missing (!!!)
// in the project's configuration (sorry!)
//
// BUGFIX: (definitive): Using Document::BufferPointer() collapses
// the buffer into a single block while returning the same
// pointer as doc->RangePointer(0, docLen) does.
//
///auto const docBegPtr = UCharCPtr(doc->RangePointer(0, docLen));
auto const docBegPtr = UCharCPtr(doc->BufferPointer());
auto const docEndPtr = UCharCPtr(doc->RangePointer(docLen, 0));
auto const rangeBegPtr = UCharCPtr(doc->RangePointer(rangeBeg, rangeLen));
auto const rangeEndPtr = UCharCPtr(doc->RangePointer(rangeEnd, 0));
Expand Down