Skip to content

Commit

Permalink
QRegularExpression: allow users to skip the UTF-16 check of the subje…
Browse files Browse the repository at this point in the history
…ct string

PCRE does not handle invalid UTF-16 sequences. For this reason we always
check a subject string's UTF-16 validity before attempting any match
over it (actually we let PCRE do that).

The only exception so far has been global matching -- once the first
match was done, we skipped re-doing the check over and over again the
same string (PCRE actually checks the /entire/ string, not only the part
it uses for matching).

Still, users had no way to skip this check if they were 100% sure the
string was a valid UTF-16 string. This commit introduces a way for them
to skip the check.

Change-Id: Iea352c06f531aa2153863b3a1681acaab7ac375c
Reviewed-by: Thiago Macieira <[email protected]>
  • Loading branch information
dangelog authored and The Qt Project committed May 12, 2014
1 parent 4532669 commit fd80cad
Show file tree
Hide file tree
Showing 2 changed files with 14 additions and 3 deletions.
14 changes: 12 additions & 2 deletions src/corelib/tools/qregularexpression.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -763,6 +763,13 @@ QT_BEGIN_NAMESPACE
The match is constrained to start exactly at the offset passed to
match() in order to be successful, even if the pattern string does not
contain any metacharacter that anchors the match at that point.
\value DontCheckSubjectStringMatchOption
The subject string is not checked for UTF-16 validity before
attempting a match. Use this option with extreme caution, as
attempting to match an invalid string may crash the program and/or
constitute a security issue. This enum value has been introduced in
Qt 5.4.
*/

// after how many usages we optimize the regexp
Expand Down Expand Up @@ -1221,7 +1228,8 @@ static int pcre16SafeExec(const pcre16 *code, const pcre16_extra *extra,
options \a matchOptions and returns the QRegularExpressionMatchPrivate of
the result. It also advances a match if a previous result is given as \a
previous. The \a subject string goes a Unicode validity check if
\a checkSubjectString is CheckSubjectString (PCRE doesn't like illegal
\a checkSubjectString is CheckSubjectString and the match options don't
include DontCheckSubjectStringMatchOption (PCRE doesn't like illegal
UTF-16 sequences).
Advancing a match is a tricky algorithm. If the previous match matched a
Expand Down Expand Up @@ -1290,8 +1298,10 @@ QRegularExpressionMatchPrivate *QRegularExpressionPrivate::doMatch(const QString
else if (matchType == QRegularExpression::PartialPreferFirstMatch)
pcreOptions |= PCRE_PARTIAL_HARD;

if (checkSubjectStringOption == DontCheckSubjectString)
if (checkSubjectStringOption == DontCheckSubjectString
|| matchOptions & QRegularExpression::DontCheckSubjectStringMatchOption) {
pcreOptions |= PCRE_NO_UTF16_CHECK;
}

bool previousMatchWasEmpty = false;
if (previous && previous->hasMatch &&
Expand Down
3 changes: 2 additions & 1 deletion src/corelib/tools/qregularexpression.h
Original file line number Diff line number Diff line change
Expand Up @@ -110,7 +110,8 @@ class Q_CORE_EXPORT QRegularExpression

enum MatchOption {
NoMatchOption = 0x0000,
AnchoredMatchOption = 0x0001
AnchoredMatchOption = 0x0001,
DontCheckSubjectStringMatchOption = 0x0002
};
Q_DECLARE_FLAGS(MatchOptions, MatchOption)

Expand Down

0 comments on commit fd80cad

Please sign in to comment.