Skip to content

DataDog/guarddog

Repository files navigation

GuardDog: PyPI Package Malware Scanner

Guarddog is a CLI tool that scans PyPI packages for user-specified malware flags. A set of predefined rules based on package registry metadata and source code analysis are used as heuristics to find malware-ridden packages.

Getting Started

guarddog can be used to scan local or remote PyPI packages using any of the available rules. Here's how to use guarddog:

CLI Reference

The structure for scanning a package is:

$ python3 -m guarddog scan [NAME] -v [VERSION] -r [RULE]

# Scan the most recent version
$ python3 -m guarddog scan setuptools 

# Scan a specific version
$ python3 -m guarddog scan setuptools -v 63.6.0 

# Scan a local package
$ python3 -m guarddog scan ./Desktop/packagename 

# Scan using a subset of the rules
$ python3 -m guarddog scan setuptools -v 63.6.0 -r code-execution -r shady-links 

To scan a requirements.txt file, use the command verify. You can also specify the name of the requirements file if it deviates from requirements.txt and an output file to store the results in.

$ python3 -m guarddog verify [PATH] -r [REQUIREMENTS-NAME] -o [OUTPUT-FILE]

$ python3 -m guarddog verify [REPOSITORY-URL] [BRANCH] -r [REQUIREMENTS-NAME] -o [OUTPUT-FILE]

# Verifies remote project and stores results in output file
$ python3 -m guarddog verify https://github.com/DataDog/guarddog/ main -o ./output.json

# Verifies local project with a differently names requirements file
$ python3 -m guarddog verify ./samplepackage -r requirements2.txt

Note that to scan specific rules, use multiple -r flags.

Installing guarddog

guarddog is not yet packaged. To run in the development environment, check out CONTRIBUTING

Testing

To run the semgrep rules against the test cases:

$ semgrep --metrics off --quiet --test --config guarddog/analyzer/sourcecode tests/analyzer/sourcecode

To find the precision and recall of the rules, run:

$ python3 evaluator/evaluator.py

This will calculate the false positive, false negative, true positive, and true negative rates from logs in guarddog_tests/evaluator/logs folder, which contains the results of scanning the data folder.

Running the command above will not scan the directories. To scan, uncomment line 351 metric_generator.scan() in guarddog_tests/evaluator/evaluator.py. Then, run the command again.

Heuristics

Heuristics are separated into two categories: registry metadata analysis and source code analysis. Registry metadata pertains to the metrics of a given package on the PyPI registry (ex. number of maintainers, popularity, similarity in package names, gaps in code pushing), while source code analysis investigates the actual code of the package. The malicious packages analyzed to guide these heuristics are listed here: PyPI Malware Analysis.

Accuracy of Heuristics

The precision, recall, and false positive rate of each rule was measured using the methods described in Testing. The precision, recall, and false positive rates achieved are:

Rule Precision Recall FP
cmd-overwrite 0.429 1.0 0.015
code-execution 0.0769 1.0 0.035
download-executable 1.0 1.0 0.0
exec-base64 0.5 1.0 0.001
exfiltrate-sensitive-data 0.526 1.0 0.012
shady-links 0.186 1.0 0.017
typosquatting 0.958 0.719 0.0

The typosquatting rule ignored the top 5000 downloaded packages (in the past month), so all error is from missed typosquatting while scanning malware.

Methodology

The precision and recall of each rule was measured by running the tool on the 1000 most downloaded PyPI packages (benign data) and a collection of about 30-40 pieces of malware that were removed from PyPI (malicious data). Every line in the top 1000 packages is considered to be safe, so any lines flagged there is considered a false positive. In the malicious dataset, dangerous lines were hand-labeled in malicious_ground_truth.json and compared to the actual result. Any discrepencies were classified as a false-negative (missed line in ground truth), true-positive (matches ground truth), or false-positive (extra line compared to ground truth). The precision and recall were calculated from these metrics.
The false positive rate used only the benign dataset, using package-level granularity. Any lines detected in a package marked the package as a false-positive. Meanwhile, if no lines were detected in the package, it was marked as a true-negative. The difference in granulary compared to precision/recall is a result of being unable to measure the number of lines in the benign dataset.

Registry Metadata Analysis

The registry metadata analysis looks for the flags detailed in the paper here: https://arxiv.org/pdf/2112.10165.pdf

Rule Reason Heuristic Examples
Typosquatting Most common way attackers get developers to install their package Check for distance one Levenshtein distance, check for swapped terms around hyphens, check if switched py to python (or vice versa), check for lookalike letters (Too many to name)
Reregistered maintainer domain Attackers can purchase an expired domain and hijack an account Check creation date of author's email on who.is and compare to package's most recent release dates ctx
Empty Package Information Legitimate packages often do not have empty descriptions Check if package description is empty

Source Code Analysis

Rule Reason Heuristic Examples
cmd-overwritten

Install command overwritten in setup.py
Custom scripts for "pip install" in setup.py allows attackers to run privileged scripts immediately when their package is installed Semgrep instances of cmdclass = {"install": [new script]}, or other equivalents, in the setup(...) function in setup.py httplib3, htpplib2, request-oathlib, unicode-csv, etc.
exec-base64

Executing hardcoded base64 encoded strings
Common obfuscation tactic Semgrep instances of base64 decoding functions and use source/sink to determine if evaluated colourama, httplib3, request-oathlib, unicode-csv, etc.
code-execution

Executing code or spawning processes
Attackers commonly execute privileged bash commands and other scripts through exec/eval/subprocess.getoutput and other commands Semgrep for functions like exec/eval/subprocess.getoutput and filter out benign commands like git or pip freeze colourama, loglib-modules, pzymail
shady-links

Suspicious domains
Attackers often use free domains or url shorteners to store scripts or to send POST requests containing sensitive information. Semgrep for suspcious domain extensions like "link" and "xyz", in addition to bit.ly links. pzymail, py-jwt, pyjtw, tenserflow, etc.
download-executable

Creating executable in setup.py
setup.py can be used as a gateway to execute other dangerous Python scripts, often fetched from the attacker's server using a GET request Use Semgrep source/sink to hunt for function calls that fetch data (request), then create files from that data and change the file permissions (os.chmod) distrib, colourama, pzymail
exfiltrate-sensitive-data

Spying on sensitive system information
Attackers collect information by recording env vars, ip addresses, usernames, os information, etc. and sending this data to their server. Use Semgrep source/sink searching to hunt for variables that record system information using os/platform/socket/etc. modules distrib, loglib-modules, tenserflow