Skip to content

Commit

Permalink
Javascript AST parser & scripts
Browse files Browse the repository at this point in the history
Uses a typescript parser chosen by using https://astexplorer.net/
It finds any regex literals or uses of RegExp with a string literal.

Now we have:
regexploit
regexploit-python
regexploit-js

Cool!

Also a WIP readme
  • Loading branch information
b-c-ds authored and bcaller committed Nov 29, 2020
1 parent df09fed commit ac13ff0
Show file tree
Hide file tree
Showing 15 changed files with 4,353 additions and 45 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,3 +2,4 @@
__pycache__
*.egg-info
*.log
node_modules
125 changes: 124 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1 +1,124 @@
this is a readme
# Regexploit

Regular Expression Denial of Service (REDoS).

Most default regular expression parsers (non-deterministic finite automata) have unbounded worst-case complexity. While they may be quick when presented with a successfully matching string, certain non-matching input strings can make the regular expression matcher go into crazy loops and take ages to process.

Something something regexes are bad.

## Starriness

This reflects the complexity of the regular expression matcher's backtracking procedure with respect to the length of the entered string.

With a starriness of 3, we have approximately cubic complexity. This means that if the vulnerable part of the string is doubled in length, the execution time should be 8 times longer (2^3).

For explotability, a cubic complexity or higher (starriness >= 3) is required unless truly giant strings are allowed as input.

For exponential REDoS with starred stars e.g. `(a*)*$` a fudge factor is used and the starriness will be greater than 10.

## Example

Run `regexploit` and enter the regular expression `abc*[a-z]+c+$` at the command line.

```
$ regexploit
abc*[a-z]+c+$
Vulnerable regex: abc*[a-z]+c+$
Redos(starriness=3, prefix_sequence=SEQ{ [a] [b] }, redos_sequence=SEQ{ [c]{0+} [[a-z]]{1+} [c]{1+} $[[a-z]] }, repeated_character=[c], killer=[^[a-z]])
Starriness: 3
Repeated character: [c]
Final character to cause backtracking: [^[a-z]]
Example: 'ab' + 'c' * 3456 + '0'
```

The part `c*[a-z]+c+` contains three overlapping repeating groups. As showed in the line `Repeated character: [c]`, a long string of `c` will match this section in many different ways. The starriness is 3 as there are 3 infinitely repeating groups. An example to cause backtracking is given: it consists of the required prefix `ab`, a long string of `c` and then a killer `0` to cause backtracking. Not all REDoSes require a particular character at the end, but in this case, a long string of `c` will match the regex successfully and won't backtrack. The line `Final character to cause backtracking: [^[a-z]]` shows that a non-matching character out of the range `[a-z]` is required at the end to prevent matching and cause REDoS.

As another example, install a module version vulnerable to REDoS such as `pip install ua-parser==0.9.0`.
To scan the installed python modules run `regexploit-python`.

```
Importing ua_parser.user_agent_parser
Vulnerable regex at /xyz/.env/lib/python3.9/site-packages/ua_parser/user_agent_parser.py, L183: self.user_agent_re = re.compile(self.pattern)
Pattern: (HbbTV)/[0-9]+\.[0-9]+\.[0-9]+ \([^;]*; *(LG)E *; *([^;]*) *;[^;]*;[^;]*;\)
Redos(starriness=3, prefix_sequence=SEQ{ [H] [b] [b] [T] [V] [2f:/] [[0-9]]{1+} [2e:.] [[0-9]]{1+} [2e:.] [[0-9]]{1+} [20] [28:(] [^3b:;]{0+} [3b:;] [20]{0+} [L] [G] [E] [20]{0+} [3b:;] }, redos_sequence=SEQ{ [20]{0+} [^3b:;]{0+} [20]{0+} [3b:;] }, repeated_character=[20], killer=None)
Starriness: 3
Repeated character: [20]
Example: 'HbbTV/0.0.0 (;LGE;' + ' ' * 3456
Vulnerable regex at /xyz/.env/lib/python3.9/site-packages/ua_parser/user_agent_parser.py, L183: self.user_agent_re = re.compile(self.pattern)
Pattern: ; *([^;/]+) Build[/ ]Huawei(MT1-U06|[A-Z]+\d+[^\);]+)[^\);]*\)
Redos(starriness=3, prefix_sequence=SEQ{ [3b:;] [20]{0+} [^3b:;,2f:/]{1+} [20] [B] [u] [i] [l] [d] [20,2f:/] [H] [u] [a] [w] [e] [i] [[A-Z]]{1+} }, redos_sequence=SEQ{ [DIGIT]{1+} [^29:),3b:;]{1+} [^29:),3b:;]{0+} [29:)] }, repeated_character=[[0-9]], ki
ller=None)
Starriness: 3
Repeated character: [[0-9]]
Example: ';0 Build/HuaweiA' + '0' * 3456
...
```

For each vulnerable regular expression it prints one or more exploitation.

# Installation

For now, clone and run

```bash
# Optionally make a virtualenv
python3 -m venv .env
source .env/bin/activate
# Now actually install
pip install -e .
(cd regexploit/bin/javascript; npm install --production)
```

# Usage

## Regex list

Enter regular expressions via stdin (one per line) into `regexploit`.

```bash
regexploit
```

or via a file

```bash
cat myregexes.txt | regexploit
```

## Python imports

Search for regexes in all the python modules currently installed in your path / env. This means you can `pip install` whatever modules you are interested in and they will be analysed.

```bash
regexploit-python
```

N.B. this doesn't parse the python code to an AST and will only find regexes compiled automatically on import.

TODO: parse python AST, with the `ast` module.

## Javascript / Typescript

This will use the NodeJS code in `regexploit/bin/javascript` which parses your javascript as an AST with [typescript-eslint](https://github.com/typescript-eslint/typescript-eslint/tree/master/packages/parser) and prints out all regexes.

Those regexes are fed into the python REDoS finder.

```bash
regexploit-js my-module/my-file.js another/file.js
regexploit-js "my-project/node_modules/**/*.js" --glob
```

N.B. there are differences between javascript and python regex parsing so there may be some errors. I'm [not sure I want](https://hackernoon.com/the-madness-of-parsing-real-world-javascript-regexps-d9ee336df983) to write a JS regex AST!

## Ruby

TODO: not so straight forward

## PHP

TODO

## Golang

Unless you specifically use a non-deterministic finite automata, Go code is safe from REDoS. It uses `re2` which matches in linear time.
2 changes: 1 addition & 1 deletion regexploit/ast/branch.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ def overall_character_class(self) -> Optional[Character]:
return c

def maximal_character_class(self):
raise NotImplementedError
return None # Really?

def example(self) -> str:
if self.optional:
Expand Down
27 changes: 27 additions & 0 deletions regexploit/bin/javascript/cli.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
const fs = require('fs').promises;
const readline = require('readline');
const findRegex = require('./find');

module.exports = {
async * parseFile(filename) {
try {
const code = await fs.readFile(filename)
yield* this.parseCode(code, filename);
} catch (error) {
yield JSON.stringify({ error });
}
},

* parseCode(code, filename) {
try {
for (const regex of findRegex.extractRegexesFromSource(code)) {
yield JSON.stringify({
...regex,
filename,
});
}
} catch (error) {
yield JSON.stringify({ error, filename });
}
}
}
42 changes: 42 additions & 0 deletions regexploit/bin/javascript/find.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
const parser = require('@typescript-eslint/parser');

module.exports = {
* extractRegexesFromSource(content) {
// options https://github.com/typescript-eslint/typescript-eslint/blob/master/packages/types/src/parser-options.ts
const tree = parser.parse(content, {
ecmaFeatures: {
jsx: true
},
ecmaVersion: 9,
errorOnTypeScriptSyntacticAndSemanticIssues: false,
errorOnUnknownASTType: false,
range: true,
});
yield* this.walkASTForRegexes(tree);
},

* walkASTForRegexes(tree) {
if (!tree) {
return;
}
if (tree.regex) {
yield tree.regex;
return;
}
if (
(tree.type == 'NewExpression' || tree.type == 'CallExpression') &&
tree.callee && tree.callee.name == 'RegExp' && tree.arguments && tree.arguments[0].type == 'Literal'
) {
yield {
'pattern': tree.arguments[0].value,
'flags': tree.arguments.length > 1 && tree.arguments[1].type == 'Literal' ? tree.arguments[1].value : ''
}
return;
}
for (element of Object.values(tree)) {
if (element && typeof element == 'object') {
yield* this.walkASTForRegexes(element);
}
}
}
}
26 changes: 26 additions & 0 deletions regexploit/bin/javascript/index.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
#!/usr/bin/env node
const readline = require('readline');
const cli = require('./cli');

const args = process.argv.slice(2)


if (args.length == 1 && args[0] == '-') {
process.stdin.setEncoding('utf-8');
var data = "";
readline.createInterface({input: process.stdin})
.on('line', l => data += l)
.on('close', () => {
for (const output of cli.parseCode(data)) {
console.log(output);
}
})
} else {
(async () => {
for (const filename of args) {
for await (let output of cli.parseFile(filename)) {
console.log(output);
}
}
})()
}
Loading

0 comments on commit ac13ff0

Please sign in to comment.