Name	Name	Last commit message	Last commit date
Latest commit History 62 Commits
.github/workflows	.github/workflows
regexploit	regexploit
tests	tests
.flake8	.flake8
.gitignore	.gitignore
LICENSE	LICENSE
MANIFEST.in	MANIFEST.in
README.md	README.md
requirements-dev.txt	requirements-dev.txt
setup.py	setup.py

Regexploit

Regular Expression Denial of Service (ReDoS).

Most default regular expression parsers (non-deterministic finite automata) have unbounded worst-case complexity. Regex matching may be quick when presented with a matching input string. However, certain non-matching input strings can make the regular expression matcher go into crazy loops and take ages to process. This can cause denial of service, as the CPU will be stuck trying to match the regex.

This tool is designed to:

find regular expressions which are vulnerable to ReDoS
give an example malicious string which will cause catastrophic backtracking

Something something regexes are bad.

Worst-case complexity

This reflects the complexity of the regular expression matcher's backtracking procedure with respect to the length of the entered string.

Cubic complexity here means that if the vulnerable part of the string is doubled in length, the execution time should be about 8 times longer (2^3). For exponential ReDoS with starred stars e.g. (a*)*$ a fudge factor is used and the complexity will be greater than 10.

For explotability, a cubic complexity or higher is typically required unless truly giant strings are allowed as input.

Example

Run regexploit and enter the regular expression abc*[a-z]+c+$ at the command line.

$ regexploit
abc*[a-z]+c+$
Pattern: abc*[a-z]+c+$
---
Worst-case complexity: 3 ⭐⭐⭐
Repeated character: [c]
Final character to cause backtracking: [^[a-z]]
Example: 'ab' + 'c' * 3456 + '0'

The part c*[a-z]+c+ contains three overlapping repeating groups. As showed in the line Repeated character: [c], a long string of c will match this section in many different ways. The worst-case complexity is 3 as there are 3 infinitely repeating groups. An example to cause ReDoS is given: it consists of the required prefix ab, a long string of c and then a 0 to cause backtracking. Not all ReDoSes require a particular character at the end, but in this case, a long string of c will match the regex successfully and won't backtrack. The line Final character to cause backtracking: [^[a-z]] shows that a non-matching character not in the range [a-z] is required at the end to prevent matching and cause ReDoS.

As another example, install a module version vulnerable to ReDoS such as pip install ua-parser==0.9.0. To scan the installed python modules run regexploit-python-env.

Importing ua_parser.user_agent_parser
Vulnerable regex in /Users/b3n/Research/redosauto/.env/lib/python3.9/site-packages/ua_parser/user_agent_parser.py #183
Pattern: \bSmartWatch *\( *([^;]+) *; *([^;]+) *;
Context: self.user_agent_re = re.compile(self.pattern)
---
Worst-case complexity: 3 ⭐⭐⭐
Repeated character: [20]
Example: 'SmartWatch(' + ' ' * 3456

Worst-case complexity: 3 ⭐⭐⭐
Repeated character: [20]
Example: 'SmartWatch(0;' + ' ' * 3456

Vulnerable regex in /Users/b3n/Research/redosauto/.env/lib/python3.9/site-packages/ua_parser/user_agent_parser.py #183
Pattern: ; *([^;/]+) Build[/ ]Huawei(MT1-U06|[A-Z]+\d+[^\);]+)[^\);]*\)
Context: self.user_agent_re = re.compile(self.pattern)
---
Worst-case complexity: 3 ⭐⭐⭐
Repeated character: [[0-9]]
Example: ';0 Build/HuaweiA' + '0' * 3456
...

For each vulnerable regular expression it prints one or more malicious string to trigger ReDoS. Setting your user agent to ;0 Build/HuaweiA000000000000000... and browsing a website using an old version of ua-parser may cause the server to take a long time to process your request, probably ending in status 502.

Installation

For now, clone and run

# Optionally make a virtualenv
python3 -m venv .env
source .env/bin/activate
# Now actually install
pip install -e .
(cd regexploit/bin/javascript; npm install --production)

Usage

Regex list

Enter regular expressions via stdin (one per line) into regexploit.

regexploit

or via a file

cat myregexes.txt | regexploit

Nothing is printed when no ReDoS is found.

Python imports

Search for regexes in all the python modules currently installed in your path / env. This means you can pip install whatever modules you are interested in and they will be analysed. Cpython code is included.

regexploit-python-env

N.B. this doesn't parse the python code to an AST and will only find regexes compiled automatically on module import. Modules are actually imported, so code in the modules will be executed.

Python code

Parses Python code (without executing it) via the AST to find regexes (with some false positives). The regexes are then analysed for ReDoS.

regexploit-py my-project/stuff.py
regexploit-py "my-project/**/*.py" --glob

Javascript / Typescript

This will use the bundled NodeJS package in regexploit/bin/javascript which parses your javascript as an AST with typescript-eslint and prints out all regexes.

Those regexes are fed into the python ReDoS finder.

regexploit-js my-module/my-file.js another/file.js
regexploit-js "my-project/node_modules/**/*.js" --glob

N.B. there are differences between javascript and python regex parsing so there may be some errors. I'm not sure I want to write a JS regex AST! Also, use NodeJS version >=12.

Ruby

TODO: not so straight forward to extract the regexes because of the way they are often built up from multiple strings.

PHP

TODO: not so straight forward to extract the regexes because of the way they are often built up from multiple strings. Can maybe grep for simple uses of preg_match and pipe into regexploit.

Golang / anything using re2

Unless you specifically use a non-deterministic finite automata, Go code is not vulnerable to this type of ReDoS. It uses re2 which does not have catastrophic backtracking.

JSON / YAML

regexploit-json *.json
regexploit-yaml *.yaml

Bugs reported

bpo-38804: cpython's http.cookiejar (Set-Cookie header parsing)
CVE-2020-5243: uap-core affecting uap-python, uap-ruby, etc. (User-Agent header parsing)
CVE-2020-8492: cpython's urllib.request (WWW-Authenticate header parsing)
CVE-2021-21236: CairoSVG (SVG parsing)
CVE-2021-21240: httplib2 (WWW-Authenticate header parsing)
CVE-2021-25292: python-pillow (PDF parsing)
CVE-2021-26813: python-markdown2 (Markdown parsing)
CVE-2021-27290: npm/ssri (SRI parsing)
CVE-2021-27291: pygments lexers for ADL, CADL, Ceylon, Evoque, Factor, Logos, Matlab, Octave, ODIN, Scilab & Varnish VCL (Syntax highlighting)
CVE-2021-27292: ua-parser-js (User-Agent header parsing)
CVE-2021-27293: RestSharp (JSON deserialisation in a .NET C# package)
Plus unpublished bugs in pypi packages, npm packages and a nuget (C#) package

Credits

This tool has been created by Ben Caller of Doyensec LLC during research time.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Regexploit

Worst-case complexity

Example

Installation

Usage

Regex list

Python imports

Python code

Javascript / Typescript

Ruby

PHP

Golang / anything using re2

JSON / YAML

Bugs reported

Credits

About

Releases

Packages

Languages

License

fl0-O/regexploit

Folders and files

Latest commit

History

Repository files navigation

Regexploit

Worst-case complexity

Example

Installation

Usage

Regex list

Python imports

Python code

Javascript / Typescript

Ruby

PHP

Golang / anything using re2

JSON / YAML

Bugs reported

Credits

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages