forked from doyensec/regexploit
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Uses a typescript parser chosen by using https://astexplorer.net/ It finds any regex literals or uses of RegExp with a string literal. Now we have: regexploit regexploit-python regexploit-js Cool! Also a WIP readme
- Loading branch information
Showing
15 changed files
with
4,353 additions
and
45 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2,3 +2,4 @@ | |
__pycache__ | ||
*.egg-info | ||
*.log | ||
node_modules |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1,124 @@ | ||
this is a readme | ||
# Regexploit | ||
|
||
Regular Expression Denial of Service (REDoS). | ||
|
||
Most default regular expression parsers (non-deterministic finite automata) have unbounded worst-case complexity. While they may be quick when presented with a successfully matching string, certain non-matching input strings can make the regular expression matcher go into crazy loops and take ages to process. | ||
|
||
Something something regexes are bad. | ||
|
||
## Starriness | ||
|
||
This reflects the complexity of the regular expression matcher's backtracking procedure with respect to the length of the entered string. | ||
|
||
With a starriness of 3, we have approximately cubic complexity. This means that if the vulnerable part of the string is doubled in length, the execution time should be 8 times longer (2^3). | ||
|
||
For explotability, a cubic complexity or higher (starriness >= 3) is required unless truly giant strings are allowed as input. | ||
|
||
For exponential REDoS with starred stars e.g. `(a*)*$` a fudge factor is used and the starriness will be greater than 10. | ||
|
||
## Example | ||
|
||
Run `regexploit` and enter the regular expression `abc*[a-z]+c+$` at the command line. | ||
|
||
``` | ||
$ regexploit | ||
abc*[a-z]+c+$ | ||
Vulnerable regex: abc*[a-z]+c+$ | ||
Redos(starriness=3, prefix_sequence=SEQ{ [a] [b] }, redos_sequence=SEQ{ [c]{0+} [[a-z]]{1+} [c]{1+} $[[a-z]] }, repeated_character=[c], killer=[^[a-z]]) | ||
Starriness: 3 | ||
Repeated character: [c] | ||
Final character to cause backtracking: [^[a-z]] | ||
Example: 'ab' + 'c' * 3456 + '0' | ||
``` | ||
|
||
The part `c*[a-z]+c+` contains three overlapping repeating groups. As showed in the line `Repeated character: [c]`, a long string of `c` will match this section in many different ways. The starriness is 3 as there are 3 infinitely repeating groups. An example to cause backtracking is given: it consists of the required prefix `ab`, a long string of `c` and then a killer `0` to cause backtracking. Not all REDoSes require a particular character at the end, but in this case, a long string of `c` will match the regex successfully and won't backtrack. The line `Final character to cause backtracking: [^[a-z]]` shows that a non-matching character out of the range `[a-z]` is required at the end to prevent matching and cause REDoS. | ||
|
||
As another example, install a module version vulnerable to REDoS such as `pip install ua-parser==0.9.0`. | ||
To scan the installed python modules run `regexploit-python`. | ||
|
||
``` | ||
Importing ua_parser.user_agent_parser | ||
Vulnerable regex at /xyz/.env/lib/python3.9/site-packages/ua_parser/user_agent_parser.py, L183: self.user_agent_re = re.compile(self.pattern) | ||
Pattern: (HbbTV)/[0-9]+\.[0-9]+\.[0-9]+ \([^;]*; *(LG)E *; *([^;]*) *;[^;]*;[^;]*;\) | ||
Redos(starriness=3, prefix_sequence=SEQ{ [H] [b] [b] [T] [V] [2f:/] [[0-9]]{1+} [2e:.] [[0-9]]{1+} [2e:.] [[0-9]]{1+} [20] [28:(] [^3b:;]{0+} [3b:;] [20]{0+} [L] [G] [E] [20]{0+} [3b:;] }, redos_sequence=SEQ{ [20]{0+} [^3b:;]{0+} [20]{0+} [3b:;] }, repeated_character=[20], killer=None) | ||
Starriness: 3 | ||
Repeated character: [20] | ||
Example: 'HbbTV/0.0.0 (;LGE;' + ' ' * 3456 | ||
Vulnerable regex at /xyz/.env/lib/python3.9/site-packages/ua_parser/user_agent_parser.py, L183: self.user_agent_re = re.compile(self.pattern) | ||
Pattern: ; *([^;/]+) Build[/ ]Huawei(MT1-U06|[A-Z]+\d+[^\);]+)[^\);]*\) | ||
Redos(starriness=3, prefix_sequence=SEQ{ [3b:;] [20]{0+} [^3b:;,2f:/]{1+} [20] [B] [u] [i] [l] [d] [20,2f:/] [H] [u] [a] [w] [e] [i] [[A-Z]]{1+} }, redos_sequence=SEQ{ [DIGIT]{1+} [^29:),3b:;]{1+} [^29:),3b:;]{0+} [29:)] }, repeated_character=[[0-9]], ki | ||
ller=None) | ||
Starriness: 3 | ||
Repeated character: [[0-9]] | ||
Example: ';0 Build/HuaweiA' + '0' * 3456 | ||
... | ||
``` | ||
|
||
For each vulnerable regular expression it prints one or more exploitation. | ||
|
||
# Installation | ||
|
||
For now, clone and run | ||
|
||
```bash | ||
# Optionally make a virtualenv | ||
python3 -m venv .env | ||
source .env/bin/activate | ||
# Now actually install | ||
pip install -e . | ||
(cd regexploit/bin/javascript; npm install --production) | ||
``` | ||
|
||
# Usage | ||
|
||
## Regex list | ||
|
||
Enter regular expressions via stdin (one per line) into `regexploit`. | ||
|
||
```bash | ||
regexploit | ||
``` | ||
|
||
or via a file | ||
|
||
```bash | ||
cat myregexes.txt | regexploit | ||
``` | ||
|
||
## Python imports | ||
|
||
Search for regexes in all the python modules currently installed in your path / env. This means you can `pip install` whatever modules you are interested in and they will be analysed. | ||
|
||
```bash | ||
regexploit-python | ||
``` | ||
|
||
N.B. this doesn't parse the python code to an AST and will only find regexes compiled automatically on import. | ||
|
||
TODO: parse python AST, with the `ast` module. | ||
|
||
## Javascript / Typescript | ||
|
||
This will use the NodeJS code in `regexploit/bin/javascript` which parses your javascript as an AST with [typescript-eslint](https://github.com/typescript-eslint/typescript-eslint/tree/master/packages/parser) and prints out all regexes. | ||
|
||
Those regexes are fed into the python REDoS finder. | ||
|
||
```bash | ||
regexploit-js my-module/my-file.js another/file.js | ||
regexploit-js "my-project/node_modules/**/*.js" --glob | ||
``` | ||
|
||
N.B. there are differences between javascript and python regex parsing so there may be some errors. I'm [not sure I want](https://hackernoon.com/the-madness-of-parsing-real-world-javascript-regexps-d9ee336df983) to write a JS regex AST! | ||
|
||
## Ruby | ||
|
||
TODO: not so straight forward | ||
|
||
## PHP | ||
|
||
TODO | ||
|
||
## Golang | ||
|
||
Unless you specifically use a non-deterministic finite automata, Go code is safe from REDoS. It uses `re2` which matches in linear time. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
const fs = require('fs').promises; | ||
const readline = require('readline'); | ||
const findRegex = require('./find'); | ||
|
||
module.exports = { | ||
async * parseFile(filename) { | ||
try { | ||
const code = await fs.readFile(filename) | ||
yield* this.parseCode(code, filename); | ||
} catch (error) { | ||
yield JSON.stringify({ error }); | ||
} | ||
}, | ||
|
||
* parseCode(code, filename) { | ||
try { | ||
for (const regex of findRegex.extractRegexesFromSource(code)) { | ||
yield JSON.stringify({ | ||
...regex, | ||
filename, | ||
}); | ||
} | ||
} catch (error) { | ||
yield JSON.stringify({ error, filename }); | ||
} | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,42 @@ | ||
const parser = require('@typescript-eslint/parser'); | ||
|
||
module.exports = { | ||
* extractRegexesFromSource(content) { | ||
// options https://github.com/typescript-eslint/typescript-eslint/blob/master/packages/types/src/parser-options.ts | ||
const tree = parser.parse(content, { | ||
ecmaFeatures: { | ||
jsx: true | ||
}, | ||
ecmaVersion: 9, | ||
errorOnTypeScriptSyntacticAndSemanticIssues: false, | ||
errorOnUnknownASTType: false, | ||
range: true, | ||
}); | ||
yield* this.walkASTForRegexes(tree); | ||
}, | ||
|
||
* walkASTForRegexes(tree) { | ||
if (!tree) { | ||
return; | ||
} | ||
if (tree.regex) { | ||
yield tree.regex; | ||
return; | ||
} | ||
if ( | ||
(tree.type == 'NewExpression' || tree.type == 'CallExpression') && | ||
tree.callee && tree.callee.name == 'RegExp' && tree.arguments && tree.arguments[0].type == 'Literal' | ||
) { | ||
yield { | ||
'pattern': tree.arguments[0].value, | ||
'flags': tree.arguments.length > 1 && tree.arguments[1].type == 'Literal' ? tree.arguments[1].value : '' | ||
} | ||
return; | ||
} | ||
for (element of Object.values(tree)) { | ||
if (element && typeof element == 'object') { | ||
yield* this.walkASTForRegexes(element); | ||
} | ||
} | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
#!/usr/bin/env node | ||
const readline = require('readline'); | ||
const cli = require('./cli'); | ||
|
||
const args = process.argv.slice(2) | ||
|
||
|
||
if (args.length == 1 && args[0] == '-') { | ||
process.stdin.setEncoding('utf-8'); | ||
var data = ""; | ||
readline.createInterface({input: process.stdin}) | ||
.on('line', l => data += l) | ||
.on('close', () => { | ||
for (const output of cli.parseCode(data)) { | ||
console.log(output); | ||
} | ||
}) | ||
} else { | ||
(async () => { | ||
for (const filename of args) { | ||
for await (let output of cli.parseFile(filename)) { | ||
console.log(output); | ||
} | ||
} | ||
})() | ||
} |
Oops, something went wrong.