Skip to content

Find regular expressions which are vulnerable to ReDoS (Regular Expression Denial of Service)

License

Notifications You must be signed in to change notification settings

sumodgeorge/regexploit

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Regexploit

Regular Expression Denial of Service (REDoS).

Most default regular expression parsers (non-deterministic finite automata) have unbounded worst-case complexity. While they may be quick when presented with a successfully matching string, certain non-matching input strings can make the regular expression matcher go into crazy loops and take ages to process.

Something something regexes are bad.

Starriness

Not sure what to call this. This reflects the complexity of the regular expression matcher's backtracking procedure with respect to the length of the entered string.

With a starriness of 3, we have approximately cubic complexity. This means that if the vulnerable part of the string is doubled in length, the execution time should be 8 times longer (2^3). For exponential REDoS with starred stars e.g. (a*)*$ a fudge factor is used and the starriness will be greater than 10.

For explotability, a cubic complexity or higher (starriness >= 3) is typically required unless truly giant strings are allowed as input.

Example

Run regexploit and enter the regular expression abc*[a-z]+c+$ at the command line.

$ regexploit
abc*[a-z]+c+$
Vulnerable regex: abc*[a-z]+c+$
Redos(starriness=3, prefix_sequence=SEQ{ [a] [b] }, redos_sequence=SEQ{ [c]{0+} [[a-z]]{1+} [c]{1+} $[[a-z]] }, repeated_character=[c], killer=[^[a-z]])
Starriness: 3
Repeated character: [c]
Final character to cause backtracking: [^[a-z]]
Example: 'ab' + 'c' * 3456 + '0'

The part c*[a-z]+c+ contains three overlapping repeating groups. As showed in the line Repeated character: [c], a long string of c will match this section in many different ways. The starriness is 3 as there are 3 infinitely repeating groups. An example to cause REDoS is given: it consists of the required prefix ab, a long string of c and then a killer 0 to cause backtracking. Not all REDoSes require a particular character at the end, but in this case, a long string of c will match the regex successfully and won't backtrack. The line Final character to cause backtracking: [^[a-z]] shows that a non-matching character not in the range [a-z] is required at the end to prevent matching and cause REDoS.

As another example, install a module version vulnerable to REDoS such as pip install ua-parser==0.9.0. To scan the installed python modules run regexploit-python.

Importing ua_parser.user_agent_parser
Vulnerable regex at /xyz/.env/lib/python3.9/site-packages/ua_parser/user_agent_parser.py, L183: self.user_agent_re = re.compile(self.pattern)
Pattern: (HbbTV)/[0-9]+\.[0-9]+\.[0-9]+ \([^;]*; *(LG)E *; *([^;]*) *;[^;]*;[^;]*;\)
Redos(starriness=3, prefix_sequence=SEQ{ [H] [b] [b] [T] [V] [2f:/] [[0-9]]{1+} [2e:.] [[0-9]]{1+} [2e:.] [[0-9]]{1+} [20] [28:(] [^3b:;]{0+} [3b:;] [20]{0+} [L] [G] [E] [20]{0+} [3b:;] }, redos_sequence=SEQ{ [20]{0+} [^3b:;]{0+} [20]{0+} [3b:;] }, repeated_character=[20], killer=None)
Starriness: 3
Repeated character: [20]
Example: 'HbbTV/0.0.0 (;LGE;' + ' ' * 3456

Vulnerable regex at /xyz/.env/lib/python3.9/site-packages/ua_parser/user_agent_parser.py, L183: self.user_agent_re = re.compile(self.pattern)
Pattern: ; *([^;/]+) Build[/ ]Huawei(MT1-U06|[A-Z]+\d+[^\);]+)[^\);]*\)
Redos(starriness=3, prefix_sequence=SEQ{ [3b:;] [20]{0+} [^3b:;,2f:/]{1+} [20] [B] [u] [i] [l] [d] [20,2f:/] [H] [u] [a] [w] [e] [i] [[A-Z]]{1+} }, redos_sequence=SEQ{ [DIGIT]{1+} [^29:),3b:;]{1+} [^29:),3b:;]{0+} [29:)] }, repeated_character=[[0-9]], ki
ller=None)
Starriness: 3
Repeated character: [[0-9]]
Example: ';0 Build/HuaweiA' + '0' * 3456
...

For each vulnerable regular expression it prints one or more exploitation.

Installation

For now, clone and run

# Optionally make a virtualenv
python3 -m venv .env
source .env/bin/activate
# Now actually install
pip install -e .
(cd regexploit/bin/javascript; npm install --production)

Usage

Regex list

Enter regular expressions via stdin (one per line) into regexploit.

regexploit

or via a file

cat myregexes.txt | regexploit

Python imports

Search for regexes in all the python modules currently installed in your path / env. This means you can pip install whatever modules you are interested in and they will be analysed.

regexploit-python

N.B. this doesn't parse the python code to an AST and will only find regexes compiled automatically on import.

TODO: parse python AST, with the ast module.

Javascript / Typescript

This will use the NodeJS code in regexploit/bin/javascript which parses your javascript as an AST with typescript-eslint and prints out all regexes.

Those regexes are fed into the python REDoS finder.

regexploit-js my-module/my-file.js another/file.js
regexploit-js "my-project/node_modules/**/*.js" --glob

N.B. there are differences between javascript and python regex parsing so there may be some errors. I'm not sure I want to write a JS regex AST!

Ruby

TODO: not so straight forward

PHP

TODO

Golang

Unless you specifically use a non-deterministic finite automata, Go code is safe from REDoS. It uses re2 which matches in linear time.

About

Find regular expressions which are vulnerable to ReDoS (Regular Expression Denial of Service)

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 94.0%
  • JavaScript 5.3%
  • C# 0.7%