Skip to content

Tricker-z/mcrit

 
 

Repository files navigation

MinHash-based Code Relationship & Investigation Toolkit (MCRIT)

Test

MCRIT is a framework created to simplify the application of the MinHash algorithm in the context of code similarity. It can be used to rapidly implement "shinglers", i.e. methods which encode properties of disassembled functions, to then be used for similarity estimation via the MinHash algorithm. It is tailored to work with disassembly reports emitted by SMDA.

Usage

Dockerized Usage

We highly recommend to use the fully packaged docker-mcrit for trivial deployment and usage.
First and foremost, this will ensure that you have fully compatible versions across all components, including a database for persistence and a web frontend for convenient interaction.

Standalone Usage

Installing MCRIT on its own will require some more steps.
For the following, we assume Ubuntu as host operating system.

The Python installation requirements are listed in requirements.txt and can be installed using:

# install python and MCRIT dependencies
$ sudo apt install python3 python3-pip
$ pip install -r requirements.txt 

By default, MongoDB 5.0 is used as backend, which is also the recommended mode of operation as it provides a persistent data storage. The following commands outline an example installation on Ubuntu:

# fetch mongodb signing key
$ sudo apt-get install gnupg
$ wget -qO - https://www.mongodb.org/static/pgp/server-5.0.asc | sudo apt-key add -
# add package repository (Ubuntu 22.04)
$ echo "deb [ arch=amd64,arm64 ] https://repo.mongodb.org/apt/ubuntu jammy/mongodb-org/5.0 multiverse" | sudo tee /etc/apt/sources.list.d/mongodb-org-5.0.list
# OR add package repository (Ubuntu 20.04)
$ echo "deb [ arch=amd64,arm64 ] https://repo.mongodb.org/apt/ubuntu focal/mongodb-org/5.0 multiverse" | sudo tee /etc/apt/sources.list.d/mongodb-org-5.0.list
# OR add package repository (Ubuntu 18.04)
$ echo "deb [ arch=amd64,arm64 ] https://repo.mongodb.org/apt/ubuntu bionic/mongodb-org/5.0 multiverse" | sudo tee /etc/apt/sources.list.d/mongodb-org-5.0.list
# install mongodb
$ sudo apt-get update
$ sudo apt-get install -y mongodb-org
# start mongodb as a service
$ sudo systemctl start mongod
# optionally configure to start the service with system startup
$ sudo systemctl enable mongod

When doing the standalone installation, you possibly want to install the MCRIT module based on the cloned repository, like so:

$ pip install -e .

After this initial installation and if desired, MCRIT can be used without an internet connection.

Operation

The MCRIT backend is generally divided into two components, a server providing an API interface to work with and one or more workers processing queued jobs. They can be started in seperate shells using:

$ python -m mcrit server

and

$ python -m mcrit worker

By default, the REST API server will be listening on http://127.0.0.1:8000/.

Interaction

Regardless of your choice for installation, once running you can interact with the MCRIT backend.

MCRIT Client

We have created a Python client module that is capable of working with all available endpoints of the server.
Documentation for this client module is currently in development.

MCRIT CLI

There is also a CLI which is based on this client package, examples:

# query some stats of the data stored in the backend 
$ python -m mcrit client status
{'status': {'db_state': 187, 'storage_type': 'mongodb', 'num_bands': 20, 'num_samples': 137, 'num_families': 14, 'num_functions': 129110, 'num_pichashes': 25385}}
# submit a malware sample with filename sample_unpacked, using family name "some_family"
$ python -m mcrit client submit sample_unpacked -f some_family
 1.039s -> (architecture: intel.32bit, base_addr: 0x10000000): 634 functions

A more extensive documentation of the MCRIT CLI is available here

MCRIT IDA Plugin

An IDA plugin is also currently under development. To use it, first create your own config.py and make required changes depending on the deployment of your MCRIT instance:

cp ./plugins/ida/template.config.py ./plugins/ida/config.py
nano ./plugins/ida/config.py

Then simply run the script found at

./plugins/ida/ida_mcrit.py

in IDA.

Reference Data

In July 2023, we started populating a Github repository which contains ready-to-use reference data for common compilers and libraries.

Version History

  • 2023-07-28 v1.0.8: IDA plugin can now display colored graphs for remote functions and do queries for PicBlockHashes (for basic blocks) for the currently viewed function.
  • 2023-06-06 v1.0.7: Extended filtering capabilities on MatchingResult.
  • 2023-06-02 v1.0.6: IDA plugin can now task matching jobs, show their results and batch import labels. Harmonization of MatchingResult.
  • 2023-05-22 v1.0.3: More robustness for path verification when using MCRIT CLI on Malpedia repo folder.
  • 2023-05-12 v1.0.1: Some progress on label import for the IDA plugin. Reflected API extension of MCRITweb in McritClient.
  • 2023-04-10 v1.0.0: Milestone release for Botconf 2023.
  • 2023-04-10 v0.25.0: IDA plugin can now do function queries for the currently viewed function.
  • 2023-03-24 v0.24.2: McritClient can forward username/apitoken, addJsonReport is now forwardable.
  • 2023-03-21 v0.24.0: FunctionEntries now can store additional FunctionLabelEntries, along submitting user/date.
  • 2023-03-17 v0.23.0: It is now possible to query matches for single SmdaFunctions (synchronously).
  • 2023-03-15 v0.22.0: McritClient now supports apitokens and raw responses for a subset of functionality.
  • 2023-03-14 v0.21.0: Backend support for more fine grained filtering.
  • 2023-03-13 v0.20.6: Backend support for filtering family/sample by score in MatchResult.
  • 2023-02-22 v0.20.4: Bugfix for calculating unique scores and accessing these results.
  • 2023-02-21 v0.20.3: Supporting frontend capabilities with result presentation.
  • 2023-02-17 v0.20.2: Extended match report object to support frontend improvements.
  • 2023-02-14 v0.20.0: Overhauled console client to simplify shell-based interactions with the backend.
  • 2023-01-12 v0.19.4: Additional filtering capabilities for MatchingResults.
  • 2022-12-13 v0.19.1: It is now possible to require specific (higher) amounts of band matches for candidates (i.e. reduce fuzziness of matching).
  • 2022-12-13 v0.18.x: Enable matching of arbitrary function IDs.
  • 2022-11-25 v0.18.9: Accelerated Query matching.
  • 2022-11-18 v0.18.8: Harmonized handling of deletion and modifications, minor fixes.
  • 2022-11-13 v0.18.7: Drastically accelerated sample deletion.
  • 2022-11-13 v0.18.6: Added functionality to modify existing sample and family information.
  • 2022-11-11 v0.18.2: Upgrading matching procedure, should now be able to handle larger binaries more robustly and efficiently.
  • 2022-11-03 v0.18.1: Minor fixes.
  • 2022-11-03 v0.18.0: Unique block isolation now also generates a proposal for a YARA rule, restructured result output.
  • 2022-10-24 v0.17.4: Harmonized setup.py with requirements, improved memory efficiency for processing cross jobs.
  • 2022-10-18 v0.17.3: Added a convenience script to recursively produce SMDA reports from a semi-structured folder.
  • 2022-10-13 v0.17.2: Fixed potential OOM issues during MinHash calculation by processing functions to be hashed in smaller batches.
  • 2022-10-12 v0.17.1: Added a function to schedule a job that will ensure minhashes have been calculated for all samples/functions.
  • 2022-10-11 v0.17.0: Search for unique blocks is now an asychronous job through the Worker.
  • 2022-10-11 v0.16.0: Samples from MatchQuery jobs will now be stored with their Sample/FunctionEntries to allow better post processing.
  • 2022-10-04 v0.15.4: Server can now display its version.
  • 2022-09-28 v0.15.3: Addressing performance issues for bigger instances, generating escaped instruction sequence for unique blocks.
  • 2022-09-26 v0.15.0: CrossJobs now in backend, started to provide functionality to identify unique basic blocks in samples.
  • 2022-08-29 v0.14.2: Minor fixes for deployment.
  • 2022-08-22 v0.14.0: Jobs can now depend on other jobs (preparation for moving crossjobs to backend), QoL improvements to job handling.
  • 2022-08-17 v0.13.1: Added commandline option for profiling (requires cProfile).
  • 2022-08-09 v0.13.0: Can now do efficient direct queries for PicHash and PicBlockHash matches.
  • 2022-08-09 v0.12.3: Bugfix for FamilyEntry
  • 2022-08-08 v0.12.2: Bugfix for delivery of XCFG data, added missing dependency.
  • 2022-08-08 v0.12.0: Integrated Advanced Search syntax.
  • 2022-08-03 v0.11.0: (BREAKING) Families are now represented with a FamilyEntry.
  • 2022-08-03 v0.10.3: Now leaving function xcfg data by default in DB, exposed access to it via REST API and McritClient.
  • 2022-07-29 v0.10.2: Added ability to delete families - now also keeping XCFG info for all functions by default.
  • 2022-07-12 v0.10.1: Improved performance.
  • 2022-07-12 v0.10.0: (BREAKING) Job handling simplified.
  • 2022-05-13 v0.9.4: Bug fix for receiving submitted files.
  • 2022-05-13 v0.9.3: Further updates to MatchingResults.
  • 2022-05-13 v0.9.2: Added another field and more convenience functions in MatchingResult for better access - those are breaking changes for previously created MatchingResults.
  • 2022-05-05 v0.9.1: Processing of binary submissions, minor fixes for minhash queuing - INITIAL RELEASE.
  • 2022-02-09 v0.9.0: Added PicBlocks to MCRIT.
  • 2022-01-19 v0.8.0: Migrated the client and the examples into the primary MCRIT repository.
  • 2021-12-16 v0.7.0: Initial private release.

Credits & Notes

Thanks to Steffen Enders and Paul Hordiienko for their contributions to the internal research prototype of this project! Thanks to Manuel Blatt for his extensive contributions to and refactorings of this project as well as for the client module!

Pull requests welcome! :)

License

    MinHash-based Code Relationship & Investigation Toolkit (MCRIT)
    Copyright (C) 2022  Daniel Plohmann, Manuel Blatt

    This program is free software: you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation, either version 3 of the License, or
    (at your option) any later version.

    This program is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along with this program.  If not, see <http://www.gnu.org/licenses/>.
    
    Some plug-ins and libraries may have different licenses. 
    If so, a license file is provided in the plug-in's folder.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.9%
  • Makefile 0.1%