Javascript AST parser & scripts

Uses a typescript parser chosen by using https://astexplorer.net/ It finds any regex literals or uses of RegExp with a string literal. Now we have: regexploit regexploit-python regexploit-js Cool! Also a WIP readme
fl0-O · Nov 29, 2020 · ac13ff0 · ac13ff0
1 parent df09fed
commit ac13ff0
Show file tree

Hide file tree

Showing 15 changed files with 4,353 additions and 45 deletions.
diff --git a/.gitignore b/.gitignore
@@ -2,3 +2,4 @@
 __pycache__
 *.egg-info
 *.log
+node_modules
diff --git a/README.md b/README.md
@@ -1 +1,124 @@
-this is a readme
+# Regexploit
+
+Regular Expression Denial of Service (REDoS).
+
+Most default regular expression parsers (non-deterministic finite automata) have unbounded worst-case complexity. While they may be quick when presented with a successfully matching string, certain non-matching input strings can make the regular expression matcher go into crazy loops and take ages to process.
+
+Something something regexes are bad.
+
+## Starriness
+
+This reflects the complexity of the regular expression matcher's backtracking procedure with respect to the length of the entered string.
+
+With a starriness of 3, we have approximately cubic complexity. This means that if the vulnerable part of the string is doubled in length, the execution time should be 8 times longer (2^3).
+
+For explotability, a cubic complexity or higher (starriness >= 3) is required unless truly giant strings are allowed as input.
+
+For exponential REDoS with starred stars e.g. `(a*)*$` a fudge factor is used and the starriness will be greater than 10.
+
+## Example
+
+Run `regexploit` and enter the regular expression `abc*[a-z]+c+$` at the command line.
+
+```
+$ regexploit
+abc*[a-z]+c+$
+Vulnerable regex: abc*[a-z]+c+$
+Redos(starriness=3, prefix_sequence=SEQ{ [a] [b] }, redos_sequence=SEQ{ [c]{0+} [[a-z]]{1+} [c]{1+} $[[a-z]] }, repeated_character=[c], killer=[^[a-z]])
+Starriness: 3
+Repeated character: [c]
+Final character to cause backtracking: [^[a-z]]
+Example: 'ab' + 'c' * 3456 + '0'
+```
+
+The part `c*[a-z]+c+` contains three overlapping repeating groups. As showed in the line `Repeated character: [c]`, a long string of `c` will match this section in many different ways. The starriness is 3 as there are 3 infinitely repeating groups. An example to cause backtracking is given: it consists of the required prefix `ab`, a long string of `c` and then a killer `0` to cause backtracking. Not all REDoSes require a particular character at the end, but in this case, a long string of `c` will match the regex successfully and won't backtrack. The line `Final character to cause backtracking: [^[a-z]]` shows that a non-matching character out of the range `[a-z]` is required at the end to prevent matching and cause REDoS.
+
+As another example, install a module version vulnerable to REDoS such as `pip install ua-parser==0.9.0`.
+To scan the installed python modules run `regexploit-python`.
+
+```
+Importing ua_parser.user_agent_parser
+Vulnerable regex at /xyz/.env/lib/python3.9/site-packages/ua_parser/user_agent_parser.py, L183: self.user_agent_re = re.compile(self.pattern)
+Pattern: (HbbTV)/[0-9]+\.[0-9]+\.[0-9]+ \([^;]*; *(LG)E *; *([^;]*) *;[^;]*;[^;]*;\)
+Redos(starriness=3, prefix_sequence=SEQ{ [H] [b] [b] [T] [V] [2f:/] [[0-9]]{1+} [2e:.] [[0-9]]{1+} [2e:.] [[0-9]]{1+} [20] [28:(] [^3b:;]{0+} [3b:;] [20]{0+} [L] [G] [E] [20]{0+} [3b:;] }, redos_sequence=SEQ{ [20]{0+} [^3b:;]{0+} [20]{0+} [3b:;] }, repeated_character=[20], killer=None)
+Starriness: 3
+Repeated character: [20]
+Example: 'HbbTV/0.0.0 (;LGE;' + ' ' * 3456
+
+Vulnerable regex at /xyz/.env/lib/python3.9/site-packages/ua_parser/user_agent_parser.py, L183: self.user_agent_re = re.compile(self.pattern)
+Pattern: ; *([^;/]+) Build[/ ]Huawei(MT1-U06|[A-Z]+\d+[^\);]+)[^\);]*\)
+Redos(starriness=3, prefix_sequence=SEQ{ [3b:;] [20]{0+} [^3b:;,2f:/]{1+} [20] [B] [u] [i] [l] [d] [20,2f:/] [H] [u] [a] [w] [e] [i] [[A-Z]]{1+} }, redos_sequence=SEQ{ [DIGIT]{1+} [^29:),3b:;]{1+} [^29:),3b:;]{0+} [29:)] }, repeated_character=[[0-9]], ki
+ller=None)
+Starriness: 3
+Repeated character: [[0-9]]
+Example: ';0 Build/HuaweiA' + '0' * 3456
+...
+```
+
+For each vulnerable regular expression it prints one or more exploitation. 
+
+# Installation
+
+For now, clone and run
+
+```bash
+# Optionally make a virtualenv
+python3 -m venv .env
+source .env/bin/activate
+# Now actually install
+pip install -e .
+(cd regexploit/bin/javascript; npm install --production)
+```
+
+# Usage
+
+## Regex list
+
+Enter regular expressions via stdin (one per line) into `regexploit`.
+
+```bash
+regexploit
+```
+
+or via a file
+
+```bash
+cat myregexes.txt | regexploit
+```
+
+## Python imports
+
+Search for regexes in all the python modules currently installed in your path / env. This means you can `pip install` whatever modules you are interested in and they will be analysed.
+
+```bash
+regexploit-python
+```
+
+N.B. this doesn't parse the python code to an AST and will only find regexes compiled automatically on import.
+
+TODO: parse python AST, with the `ast` module.
+
+## Javascript / Typescript
+
+This will use the NodeJS code in `regexploit/bin/javascript` which parses your javascript as an AST with [typescript-eslint](https://github.com/typescript-eslint/typescript-eslint/tree/master/packages/parser) and prints out all regexes.
+
+Those regexes are fed into the python REDoS finder.
+
+```bash
+regexploit-js my-module/my-file.js another/file.js
+regexploit-js "my-project/node_modules/**/*.js" --glob
+```
+
+N.B. there are differences between javascript and python regex parsing so there may be some errors. I'm [not sure I want](https://hackernoon.com/the-madness-of-parsing-real-world-javascript-regexps-d9ee336df983) to write a JS regex AST!
+
+## Ruby
+
+TODO: not so straight forward
+
+## PHP
+
+TODO
+
+## Golang
+
+Unless you specifically use a non-deterministic finite automata, Go code is safe from REDoS. It uses `re2` which matches in linear time.
diff --git a/regexploit/ast/branch.py b/regexploit/ast/branch.py
@@ -33,7 +33,7 @@ def overall_character_class(self) -> Optional[Character]:
         return c
 
     def maximal_character_class(self):
-        raise NotImplementedError
+        return None  # Really?
 
     def example(self) -> str:
         if self.optional:

diff --git a/regexploit/bin/javascript/cli.js b/regexploit/bin/javascript/cli.js
@@ -0,0 +1,27 @@
+const fs = require('fs').promises;
+const readline = require('readline');
+const findRegex = require('./find');
+
+module.exports = {
+  async * parseFile(filename) {
+    try {
+      const code = await fs.readFile(filename)
+      yield* this.parseCode(code, filename);
+    } catch (error) {
+      yield JSON.stringify({ error });
+    }
+  },
+
+  * parseCode(code, filename) {
+    try {
+      for (const regex of findRegex.extractRegexesFromSource(code)) {
+        yield JSON.stringify({
+          ...regex,
+          filename,
+        });
+      }
+    } catch (error) {
+      yield JSON.stringify({ error, filename });
+    }
+  }
+}
diff --git a/regexploit/bin/javascript/find.js b/regexploit/bin/javascript/find.js
@@ -0,0 +1,42 @@
+const parser = require('@typescript-eslint/parser');
+
+module.exports = {
+    * extractRegexesFromSource(content) {
+        // options https://github.com/typescript-eslint/typescript-eslint/blob/master/packages/types/src/parser-options.ts
+        const tree = parser.parse(content, {
+            ecmaFeatures: {
+                jsx: true
+            },
+            ecmaVersion: 9,
+            errorOnTypeScriptSyntacticAndSemanticIssues: false,
+            errorOnUnknownASTType: false,
+            range: true,
+        });
+        yield* this.walkASTForRegexes(tree);
+    },
+
+    * walkASTForRegexes(tree) {
+        if (!tree) {
+            return;
+        }
+        if (tree.regex) {
+            yield tree.regex;
+            return;
+        }
+        if (
+            (tree.type == 'NewExpression' || tree.type == 'CallExpression') &&
+            tree.callee && tree.callee.name == 'RegExp' && tree.arguments && tree.arguments[0].type == 'Literal'
+        ) {
+            yield {
+                'pattern': tree.arguments[0].value,
+                'flags': tree.arguments.length > 1 && tree.arguments[1].type == 'Literal' ? tree.arguments[1].value : ''
+            }
+            return;
+        }
+        for (element of Object.values(tree)) {
+            if (element && typeof element == 'object') {
+                yield* this.walkASTForRegexes(element);
+            }
+        }
+    }
+}
diff --git a/regexploit/bin/javascript/index.js b/regexploit/bin/javascript/index.js
@@ -0,0 +1,26 @@
+#!/usr/bin/env node
+const readline = require('readline');
+const cli = require('./cli');
+
+const args = process.argv.slice(2)
+
+
+if (args.length == 1 && args[0] == '-') {
+  process.stdin.setEncoding('utf-8');
+  var data = "";
+  readline.createInterface({input: process.stdin})
+    .on('line', l => data += l)
+    .on('close', () => {
+      for (const output of cli.parseCode(data)) {
+        console.log(output);
+      }
+    })
+} else {
+  (async () => {
+    for (const filename of args) {
+      for await (let output of cli.parseFile(filename)) {
+        console.log(output);
+      }
+    }
+  })()
+}