Project Nebula

Description

This repository is part of the Nebula: Self-Attention for Dynamic Malware Analysis project.

All Nebula and alternative dynamic malware analysis models are under nebula/models directory.

Examples of usage are under scripts/ directory.

Dataset used for experiments and pretraining downloadable from kaggle.com/datasets/dmitrijstrizna/quo-vadis-malware-emulation.

Installation

Model code and Nebula pretrained objects are available as pip package:

pip install git+https://github.com/dtrizna/nebula

Usage Example

from nebula import Nebula

# 0. MODEL SETUP
nebula = Nebula(
    vocab_size = 50000, # pre-trained only for 50k
    seq_len = 512, # pre-trained only for 512
    tokenizer = "bpe" # supports: ["bpe", "whitespace"],
)

# 1. EMULATE IT: SKIP IF YOU HAVE JSON REPORTS ALREADY
pe = r"C:\Windows\System32\calc.exe"
report = nebula.dynamic_analysis_pe_file(pe)

# 2. PREPROCESS EMULATED JSON REPORT AS ARRAY
x_arr = nebula.preprocess(report)

# 3. PASS THROUGH PYTORCH MODEL
prob = nebula.predict_proba(x_arr)

print(f"\n[!] Probability of being malicious: {prob:.3f}")

Running this:

> python3 scripts\nebula_pe_to_preds.py

INFO:root: [!] Successfully loaded pre-trained tokenizer model!
INFO:root: [!] Loaded vocab from <REDACTED>\nebula\objects\bpe_50000_vocab.json
INFO:root: [!] Tokenizer ready!
INFO:root: [!] Model ready!

[!] Probability of being malicious: 0.001

Pre-training with Self-Supervised Learning (SSL)

Nebula is capable of learning from unlabeled data using self-supervised learning (SSL) techniques. Extensive evaluation of SSL efficiency and API level interface is a subject of future work.

Masked Language Model

Implementation is under nebula.lit_pretraining.MaskedLanguageModelTrainer class.

Auto-Regressive Language Model

Implementation is under nebula.lit_pretraining.AutoRegressiveModelTrainer class.

Name		Name	Last commit message	Last commit date
Latest commit History 355 Commits
emulation		emulation
img		img
nebula		nebula
scripts		scripts
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
MANIFEST.in		MANIFEST.in
README.md		README.md
TODO.md		TODO.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project Nebula

Description

Installation

Usage Example

Pre-training with Self-Supervised Learning (SSL)

Masked Language Model

Auto-Regressive Language Model

About

Releases

Packages

Contributors 3

Languages

License

dtrizna/nebula

Folders and files

Latest commit

History

Repository files navigation

Project Nebula

Description

Installation

Usage Example

Pre-training with Self-Supervised Learning (SSL)

Masked Language Model

Auto-Regressive Language Model

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages