Skip to content

Commit

Permalink
Initial commit
Browse files Browse the repository at this point in the history
Initial public release
  • Loading branch information
skochinsky authored Jan 24, 2023
0 parents commit 4e42a98
Show file tree
Hide file tree
Showing 35 changed files with 3,690 additions and 0 deletions.
40 changes: 40 additions & 0 deletions BUILDING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@

# Bulding gooMBA

## dependencies

gooMBA requires IDA SDK (8.2 or later) and the [z3 library](https://github.com/Z3Prover/z3).

## Building

1. After unpacking and setting up the SDK, copy goomba source tree under SDK's `plugins` directory,
for example `C:\idasdk_pro82\plugins\goomba`.

2. Download and extract [z3 build for your OS](https://github.com/Z3Prover/z3/releases) into the `z3` subdirectory, for example:

`z3/z3-4.11.2-x64-win`

Under it, you should have `bin` and `include` directories

Alternatively, set `Z3_BIN` and `Z3_INCLUDE` to point to the directories elsewhere.

3. If using another `z3` version than 4.11.2, open `makefile` and adjust

* `Z3VER` with the correct version number
* possibly, `Z3BIN` with the correct glibc version number (on Linux),
or osx version number (on macOS)

4. build the necessary version of gooMBA, for example:

```make -j``` for 32-bit IDA
```make __EA64__=1 -j``` for IDA64

5. Copy generated files from SDK's bin directory to your IDA install (or [user directory](https://hex-rays.com/blog/igors-tip-of-the-week-33-idas-user-directory-idausr/)):

* `C:\idasdk_pro82\bin\plugins\goomba*` -> `C:\Program Files\IDA Pro 8.2\plugins\`
* `C:\idasdk_pro82\bin\cfg\goomba.cfg` -> `C:\Program Files\IDA Pro 8.2\cfg\`
* `C:\idasdk_pro82\bin\libz3.*` -> `C:\Program Files\IDA Pro 8.2\`




52 changes: 52 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
# gooMBA

gooMBA is a Hex-Rays Decompiler plugin that simplifies Mixed Boolean-Arithmetic
(MBA) expressions. It achieves this using several heuristics and algorithms to
achieve orders-of-magnitude better performance than existing state-of-the-art
solutions.

More information on the inner workings of this tool is available in our [blog
post](https://hex-rays.com/blog/deobfuscation-with-goomba/).

## Core Features
- Full integration with the Hex-Rays Decompiler
- Simplifies linear MBAs, including opaque predicates
- Handles sign extension for linear functions
- Verifies soundness of simplifications using the z3 SMT solver
- Simplifies non-linear MBAs with the use of a function fingerprint oracle

## Usage

By default, the plugin does not run automatically. You can invoke the plugin
by right clicking in the pseudocode view and selecting "Run gooMBA Optimizer".
In addition, you can set up a keyboard shortcut in IDA by opening Options ->
Shortcuts... and adding a shortcut for the `goomba:run` action.

Several options for usage are available within `goomba.cfg`. You can set up a
fingerprint oracle, configure the z3 proof timeout time, choose the desired behavior when
timeouts occur, and choose to make the plugin run automatically without needing
to be invoked from the right-click menu.

## Demo

The sample database `tests/idb/mba_challenge.i64` was created from the `mba_challenge` binary. The functions
`mba1`, `mba2`, `mba3`, `mba`, `solve_me` contain MBA expressions of varying complexity.

For example, the `mba1` function's initial pseudocode:
![mba1 initial pseudocode](./images/mba1_before.png)

And after running gooMBA optimization:
![mba1 pseudocode optimized](./images/mba1_after.png)


## Fingerprint oracle

The oracle can be used for simplifying non-linear MBAs.
The input for generaring it is a list of candidate expressions in [msynth](https://github.com/mrphrazer/msynth) syntax.
You can use `generate_oracle.sh` or `generate_oracle.bat` to generate a binary
oracle file which can then be used by the plugin by specifying the path to it
in `goomba.cfg` (parameter `MBA_ORACLE_PATH`).

A large pre-computed oracle is available [here](https://hex-rays.com/products/ida/support/freefiles/goomba-oracle.7z)

NOTE: oracle files generated with IDA 8.2 can only be used with 64-bit binaries, otherwise you may hit internal error 30661.
168 changes: 168 additions & 0 deletions bitwise_expr_lookup_tbl.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,168 @@
/*
* Copyright (c) 2023 by Hex-Rays, [email protected]
* ALL RIGHTS RESERVED.
*
* gooMBA plugin for Hex-Rays Decompiler.
*
*/

#include "z3++_no_warn.h"
#include "bitwise_expr_lookup_tbl.hpp"

bw_expr_tbl_t bw_expr_tbl_t::instance;

bw_expr_tbl_t::bw_expr_tbl_t()
{
minsn_templates_t X;
X.push_back(std::make_shared<mt_varref_t>(0));
X.push_back(std::make_shared<mt_varref_t>(1));
X.push_back(std::make_shared<mt_varref_t>(2));

minsn_template_ptr_t zero = std::make_shared<mt_constant_t>(0ull);

// note that all expressions are ordered by the numeric value of the instruction trace
// see lin_conj_exprs.hpp for more info on ordering.
auto &onevar = tbl.push_back();
onevar.push_back(zero); // [0 0]
onevar.push_back(X[0]); // [0 1]

auto &twovar = tbl.push_back();
twovar.push_back(zero); // [0 0 0 0]
twovar.push_back(X[0]&~X[1]); // [0 1 0 0]
twovar.push_back(~(X[0]|~X[1])); // [0 0 1 0]
twovar.push_back(X[0]^X[1]); // [0 1 1 0]
twovar.push_back(X[0]&X[1]); // [0 0 0 1]
twovar.push_back(X[0]); // [0 1 0 1]
twovar.push_back(X[1]); // [0 0 1 1]
twovar.push_back(X[0]|X[1]); // [0 1 1 1]

auto &threevar = tbl.push_back();
threevar.push_back(zero); // [0 0 0 0 0 0 0 0]
threevar.push_back(~(~X[0]|(X[1]|X[2]))); // [0 1 0 0 0 0 0 0]
threevar.push_back(~(X[0]|(~X[1]|X[2]))); // [0 0 1 0 0 0 0 0]
threevar.push_back(~X[2]&(X[0]^X[1])); // [0 1 1 0 0 0 0 0]
threevar.push_back(~(~X[0]|(~X[1]|X[2]))); // [0 0 0 1 0 0 0 0]
threevar.push_back(X[0]&~X[2]); // [0 1 0 1 0 0 0 0]
threevar.push_back(X[1]&~X[2]); // [0 0 1 1 0 0 0 0]
threevar.push_back(X[2]^(X[0]|(X[1]|X[2]))); // [0 1 1 1 0 0 0 0]
threevar.push_back(~X[0]&(~X[1]&X[2])); // [0 0 0 0 1 0 0 0]
threevar.push_back(~X[1]&(X[0]^X[2])); // [0 1 0 0 1 0 0 0]
threevar.push_back(~X[0]&(X[1]^X[2])); // [0 0 1 0 1 0 0 0]
threevar.push_back(~(X[0]&X[1])&(X[0]^(X[1]^X[2]))); // [0 1 1 0 1 0 0 0]
threevar.push_back(~(X[0]^X[1])&(X[0]^X[2])); // [0 0 0 1 1 0 0 0]
threevar.push_back(X[2]^(X[0]|(X[1]&X[2]))); // [0 1 0 1 1 0 0 0]
threevar.push_back(~(X[0]&~X[1])&(X[1]^X[2])); // [0 0 1 1 1 0 0 0]
threevar.push_back(X[2]^(X[0]|X[1])); // [0 1 1 1 1 0 0 0]
threevar.push_back(X[0]&(~X[1]&X[2])); // [0 0 0 0 0 1 0 0]
threevar.push_back(X[0]&~X[1]); // [0 1 0 0 0 1 0 0]
threevar.push_back((X[0]^X[1])&~(X[0]^X[2])); // [0 0 1 0 0 1 0 0]
threevar.push_back(X[1]^(X[0]|(X[1]&X[2]))); // [0 1 1 0 0 1 0 0]
threevar.push_back(X[0]&(X[1]^X[2])); // [0 0 0 1 0 1 0 0]
threevar.push_back(~(~X[0]|(X[1]&X[2]))); // [0 1 0 1 0 1 0 0]
threevar.push_back((X[0]|X[1])&(X[1]^X[2])); // [0 0 1 1 0 1 0 0]
threevar.push_back((X[0]&X[1])^~(X[0]^(~X[1]|X[2]))); // [0 1 1 1 0 1 0 0]
threevar.push_back(~(X[1]|~X[2])); // [0 0 0 0 1 1 0 0]
threevar.push_back(X[1]^(X[0]|(X[1]|X[2]))); // [0 1 0 0 1 1 0 0]
threevar.push_back(~(X[0]&X[1])&(X[1]^X[2])); // [0 0 1 0 1 1 0 0]
threevar.push_back(X[1]^(X[0]|X[2])); // [0 1 1 0 1 1 0 0]
threevar.push_back((X[0]|~X[1])&(X[1]^X[2])); // [0 0 0 1 1 1 0 0]
threevar.push_back((X[0]&X[2])^(X[0]^(~X[1]&X[2]))); // [0 1 0 1 1 1 0 0]
threevar.push_back(X[1]^X[2]); // [0 0 1 1 1 1 0 0]
threevar.push_back((X[0]&~X[1])|(X[1]^X[2])); // [0 1 1 1 1 1 0 0]
threevar.push_back(~X[0]&(X[1]&X[2])); // [0 0 0 0 0 0 1 0]
threevar.push_back((X[0]^X[1])&(X[0]^X[2])); // [0 1 0 0 0 0 1 0]
threevar.push_back(~(X[0]|~X[1])); // [0 0 1 0 0 0 1 0]
threevar.push_back(X[1]^~(~X[0]|(~X[1]&X[2]))); // [0 1 1 0 0 0 1 0]
threevar.push_back(X[1]&(X[0]^X[2])); // [0 0 0 1 0 0 1 0]
threevar.push_back(X[2]^(X[0]|(~X[1]&X[2]))); // [0 1 0 1 0 0 1 0]
threevar.push_back(X[1]&~(X[0]&X[2])); // [0 0 1 1 0 0 1 0]
threevar.push_back(X[1]^~(~X[0]|(X[1]^X[2]))); // [0 1 1 1 0 0 1 0]
threevar.push_back(~(X[0]|~X[2])); // [0 0 0 0 1 0 1 0]
threevar.push_back(X[2]^(X[0]&(~X[1]|X[2]))); // [0 1 0 0 1 0 1 0]
threevar.push_back(~X[0]&(X[1]|X[2])); // [0 0 1 0 1 0 1 0]
threevar.push_back(X[0]^(X[1]|X[2])); // [0 1 1 0 1 0 1 0]
threevar.push_back(X[2]^(X[0]&(X[1]|X[2]))); // [0 0 0 1 1 0 1 0]
threevar.push_back(X[0]^X[2]); // [0 1 0 1 1 0 1 0]
threevar.push_back((X[0]&X[2])^(X[1]|X[2])); // [0 0 1 1 1 0 1 0]
threevar.push_back(X[2]^~(~X[0]&(~X[1]|X[2]))); // [0 1 1 1 1 0 1 0]
threevar.push_back(X[2]&(X[0]^X[1])); // [0 0 0 0 0 1 1 0]
threevar.push_back(X[1]^~(~X[0]&(~X[1]|X[2]))); // [0 1 0 0 0 1 1 0]
threevar.push_back(X[1]^(X[0]&(X[1]|X[2]))); // [0 0 1 0 0 1 1 0]
threevar.push_back(X[0]^X[1]); // [0 1 1 0 0 1 1 0]
threevar.push_back((X[0]|X[1])&~(X[0]^(X[1]^X[2]))); // [0 0 0 1 0 1 1 0]
threevar.push_back(X[0]^(X[1]&X[2])); // [0 1 0 1 0 1 1 0]
threevar.push_back(X[1]^(X[0]&X[2])); // [0 0 1 1 0 1 1 0]
threevar.push_back(X[1]^(X[0]&(~X[1]|X[2]))); // [0 1 1 1 0 1 1 0]
threevar.push_back(X[2]&~(X[0]&X[1])); // [0 0 0 0 1 1 1 0]
threevar.push_back(X[1]^(X[0]|(X[1]^X[2]))); // [0 1 0 0 1 1 1 0]
threevar.push_back((X[0]&X[1])^(X[1]|X[2])); // [0 0 1 0 1 1 1 0]
threevar.push_back(X[1]^(X[0]|(~X[1]&X[2]))); // [0 1 1 0 1 1 1 0]
threevar.push_back(X[2]^(X[0]&X[1])); // [0 0 0 1 1 1 1 0]
threevar.push_back(X[2]^~(~X[0]|(~X[1]&X[2]))); // [0 1 0 1 1 1 1 0]
threevar.push_back(~(X[0]|~X[1])|(X[1]^X[2])); // [0 0 1 1 1 1 1 0]
threevar.push_back((X[0]^X[1])|(X[0]^X[2])); // [0 1 1 1 1 1 1 0]
threevar.push_back(X[0]&(X[1]&X[2])); // [0 0 0 0 0 0 0 1]
threevar.push_back(~(~X[0]|(X[1]^X[2]))); // [0 1 0 0 0 0 0 1]
threevar.push_back(X[1]&~(X[0]^X[2])); // [0 0 1 0 0 0 0 1]
threevar.push_back((X[0]|X[1])&(X[0]^(X[1]^X[2]))); // [0 1 1 0 0 0 0 1]
threevar.push_back(X[0]&X[1]); // [0 0 0 1 0 0 0 1]
threevar.push_back(~(~X[0]|(~X[1]&X[2]))); // [0 1 0 1 0 0 0 1]
threevar.push_back(X[1]&(X[0]|~X[2])); // [0 0 1 1 0 0 0 1]
threevar.push_back((X[1]&~X[2])|~(~X[0]|(~X[1]&X[2]))); // [0 1 1 1 0 0 0 1]
threevar.push_back(X[2]&~(X[0]^X[1])); // [0 0 0 0 1 0 0 1]
threevar.push_back((X[0]|~X[1])&(X[0]^(X[1]^X[2]))); // [0 1 0 0 1 0 0 1]
threevar.push_back(~(X[0]&~X[1])&(X[0]^(X[1]^X[2]))); // [0 0 1 0 1 0 0 1]
threevar.push_back(X[0]^(X[1]^X[2])); // [0 1 1 0 1 0 0 1]
threevar.push_back(X[1]^(~X[0]&(X[1]|X[2]))); // [0 0 0 1 1 0 0 1]
threevar.push_back(X[0]^(~X[1]&X[2])); // [0 1 0 1 1 0 0 1]
threevar.push_back(X[1]^~(X[0]|~X[2])); // [0 0 1 1 1 0 0 1]
threevar.push_back((X[0]&X[1])|(X[0]^(X[1]^X[2]))); // [0 1 1 1 1 0 0 1]
threevar.push_back(X[0]&X[2]); // [0 0 0 0 0 1 0 1]
threevar.push_back(X[0]&(~X[1]|X[2])); // [0 1 0 0 0 1 0 1]
threevar.push_back(X[2]^(~X[0]&(X[1]|X[2]))); // [0 0 1 0 0 1 0 1]
threevar.push_back(~(X[0]^(~X[1]|X[2]))); // [0 1 1 0 0 1 0 1]
threevar.push_back(X[0]&(X[1]|X[2])); // [0 0 0 1 0 1 0 1]
threevar.push_back(X[0]); // [0 1 0 1 0 1 0 1]
threevar.push_back((X[0]&X[2])|(X[1]&~X[2])); // [0 0 1 1 0 1 0 1]
threevar.push_back(~(~X[0]&(~X[1]|X[2]))); // [0 1 1 1 0 1 0 1]
threevar.push_back(X[2]&(X[0]|~X[1])); // [0 0 0 0 1 1 0 1]
threevar.push_back((X[1]&~X[2])^(X[0]|(X[1]^X[2]))); // [0 1 0 0 1 1 0 1]
threevar.push_back(X[2]^~(X[0]|~X[1])); // [0 0 1 0 1 1 0 1]
threevar.push_back((X[0]&~X[1])|(X[0]^(X[1]^X[2]))); // [0 1 1 0 1 1 0 1]
threevar.push_back((X[0]&X[1])|~(X[1]|~X[2])); // [0 0 0 1 1 1 0 1]
threevar.push_back(X[0]|(~X[1]&X[2])); // [0 1 0 1 1 1 0 1]
threevar.push_back((X[0]&X[1])|(X[1]^X[2])); // [0 0 1 1 1 1 0 1]
threevar.push_back(X[0]|(X[1]^X[2])); // [0 1 1 1 1 1 0 1]
threevar.push_back(X[1]&X[2]); // [0 0 0 0 0 0 1 1]
threevar.push_back((X[0]|X[1])&~(X[1]^X[2])); // [0 1 0 0 0 0 1 1]
threevar.push_back(X[1]&~(X[0]&~X[2])); // [0 0 1 0 0 0 1 1]
threevar.push_back(X[1]^(X[0]&~X[2])); // [0 1 1 0 0 0 1 1]
threevar.push_back(X[1]&(X[0]|X[2])); // [0 0 0 1 0 0 1 1]
threevar.push_back((X[0]&X[2])^(X[0]^(X[1]&X[2]))); // [0 1 0 1 0 0 1 1]
threevar.push_back(X[1]); // [0 0 1 1 0 0 1 1]
threevar.push_back(X[1]|(X[0]&~X[2])); // [0 1 1 1 0 0 1 1]
threevar.push_back(X[2]&~(X[0]&~X[1])); // [0 0 0 0 1 0 1 1]
threevar.push_back(X[2]^(X[0]&~X[1])); // [0 1 0 0 1 0 1 1]
threevar.push_back((X[1]&X[2])|(~X[0]&(X[1]|X[2]))); // [0 0 1 0 1 0 1 1]
threevar.push_back(~(X[0]|~X[1])|(X[0]^(X[1]^X[2]))); // [0 1 1 0 1 0 1 1]
threevar.push_back(X[1]^(~X[0]&(X[1]^X[2]))); // [0 0 0 1 1 0 1 1]
threevar.push_back(X[2]^~(~X[0]|(X[1]&X[2]))); // [0 1 0 1 1 0 1 1]
threevar.push_back(X[1]|~(X[0]|~X[2])); // [0 0 1 1 1 0 1 1]
threevar.push_back(X[1]|(X[0]^X[2])); // [0 1 1 1 1 0 1 1]
threevar.push_back(X[2]&(X[0]|X[1])); // [0 0 0 0 0 1 1 1]
threevar.push_back((X[0]&X[1])^(X[0]^(X[1]&X[2]))); // [0 1 0 0 0 1 1 1]
threevar.push_back(X[1]^(X[0]&(X[1]^X[2]))); // [0 0 1 0 0 1 1 1]
threevar.push_back(X[1]^~(~X[0]|(X[1]&X[2]))); // [0 1 1 0 0 1 1 1]
threevar.push_back((X[1]&X[2])|(X[0]&(X[1]|X[2]))); // [0 0 0 1 0 1 1 1]
threevar.push_back(X[0]|(X[1]&X[2])); // [0 1 0 1 0 1 1 1]
threevar.push_back(X[1]|(X[0]&X[2])); // [0 0 1 1 0 1 1 1]
threevar.push_back(X[0]|X[1]); // [0 1 1 1 0 1 1 1]
threevar.push_back(X[2]); // [0 0 0 0 1 1 1 1]
threevar.push_back(X[2]|(X[0]&~X[1])); // [0 1 0 0 1 1 1 1]
threevar.push_back(X[2]|~(X[0]|~X[1])); // [0 0 1 0 1 1 1 1]
threevar.push_back(X[2]|(X[0]^X[1])); // [0 1 1 0 1 1 1 1]
threevar.push_back(X[2]|(X[0]&X[1])); // [0 0 0 1 1 1 1 1]
threevar.push_back(X[0]|X[2]); // [0 1 0 1 1 1 1 1]
threevar.push_back(X[1]|X[2]); // [0 0 1 1 1 1 1 1]
threevar.push_back(X[0]|(X[1]|X[2])); // [0 1 1 1 1 1 1 1]
}
43 changes: 43 additions & 0 deletions bitwise_expr_lookup_tbl.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
/*
* Copyright (c) 2023 by Hex-Rays, [email protected]
* ALL RIGHTS RESERVED.
*
* gooMBA plugin for Hex-Rays Decompiler.
*
*/

#pragma once
#include "minsn_template.hpp"

// bw_expr_tbl_t is a singleton class that maintains a lookup table mapping
// boolean function evaluation traces (i.e. I/O behavior) to the shortest
// representation of each boolean function.
// for instance, if you found a boolean function f(x, y) with the following
// behavior: f(0, 0) = 0, f(0, 1) = 0, f(1, 0) = 0, f(1, 1) = 1, then you
// can query this object to find that f(x, y) = x & y.
// note that we do not consider any functions that return 1 on the all-zeros
// input.
class bw_expr_tbl_t
{
qvector<minsn_templates_t> tbl;

public:
static bw_expr_tbl_t instance;

// do not call directly, use instance instead
bw_expr_tbl_t();

// eval_trace is a bitmap whose i'th bit contains the
// boolean function's evaluation on the i'th conjunction,
// where conjunctions are ordered in the same way as in lin_conj_exprs.hpp
minsn_template_ptr_t lookup(int nvars, uint64_t bit_trace)
{
QASSERT(30698, (bit_trace & 1) == 0);
QASSERT(30699, nvars <= 3);
QASSERT(30700, nvars >= 1);
QASSERT(30701, bit_trace < (1ull << (1ull << (nvars))));
return tbl[nvars-1][bit_trace >> 1];
// since the 0th conjunction is never considered, all vector indices are
// divided by 2. See the corresponding .cpp file for more info.
}
};
27 changes: 27 additions & 0 deletions consts.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
/*
* Copyright (c) 2023 by Hex-Rays, [email protected]
* ALL RIGHTS RESERVED.
*
* gooMBA plugin for Hex-Rays Decompiler.
*
*/

#pragma once
#define ACTION_NAME "goomba:run"
// Z3_TIMEOUT_MS defines the amount of time we allow the z3 theorem prover to
// take to prove any given statement
#define Z3_TIMEOUT_MS 1000

// Only used for *generating* oracles: how many test cases to run against each
// function to generate fingerprints. Note that an existing oracle will report
// its own number, and the below constant will not be used
#define TCS_PER_EQUIV_CLASS 128
// The number of inputs used when evaluating functions for fingerprinting
#define CANDIDATE_EXPR_NUMINPUTS 5
// The maximum number of candidates to consider which have the same fingerprint
// as the expression being simplified
#define EQUIV_CLASS_MAX_CANDIDATES 10
// The maximum number of fingerprints to consider for each expression being
// simplified -- this number is greater than one since we consider every
// possible assignment of input variables
#define EQUIV_CLASS_MAX_FINGERPRINTS 50
Loading

0 comments on commit 4e42a98

Please sign in to comment.