Skip to content

H4dr1en/guarddog

Repository files navigation

GuardDog Banner

GuardDog is a CLI tool that allows to identify malicious PyPI packages. It runs a set of heuristics on the package source code (through Semgrep rules) and on the package metadata.

GuardDog can be used to scan local or remote PyPI packages using any of the available heuristics.

GuardDog demo usage

Getting started

Installation

pip install git+https://github.com/DataDog/guarddog.git

Sample usage

# Scan the most recent version of the 'requests' package
guarddog scan requests

# Scan a specific version of the 'requests' package
guarddog scan requests --version 2.28.1

# Scan the 'request' package using 2 specific heuristics
guarddog scan requests --rules exec-base64 --rules code-execution

# Scan the 'requests' package using all rules but one
guarddog scan requests --exclude-rules exec-base64

# Scan a local package
guarddog scan /tmp/triage.tar.gz

# Scan every package referenced in a requirements.txt file of a local folder
guarddog verify workspace/guarddog/requirements.txt

# Output JSON to standard output - works for every command
guarddog scan requests --json

Heuristics

GuardDog comes with 2 types of heuristics:

Source code heuristics

Heuristic Description
Command overwrite The install command is overwritten in the setup.py file, indicating that a system command is automatically run when installing the package through pip install.
Dynamic execution of base64-encoded data A base64-encoded string ends up being executed by a function like exec or eval
Download of an executable to disk Data coming from an HTTP response ends up being written to disk and made executable
Exfiltration of sensitive data to a remote server Sensitive data from the environment ends up being sent through an HTTP request
Code execution in setup.py Code in setup.py executes code dynamically or starts a new process
Unusual domain extension Usage of a domain name with an extension frequently used by malware (e.g. .xyz or .top)

Package metadata heuristics

Heuristic Description
Typosquatting Package has a name close to one of the top 5k PyPI packages
Potentially compromised maintainer e-mail domain Maintainer e-mail address is associated to a domain that was re-registered later than the last package release. This can be an indicator that this is a custom domain that expired, and was leveraged by an attacker to compromise the package owner's PyPI account. See here for a description of the issue for npm.
Empty package description Package has an empty description of PyPI

Development

Running a local version of GuardDog

  • Clone the repository
  • Create a virtualenv: python3 -m venv venv && source venv/bin/activate
  • Run GuardDog using python -m guarddog

Unit tests

Running all unit tests: make test

Running unit tests againt Semgrep rules: make test-semgrep-rules (tests are here). These use the standard methodology for testing Semgrep rules.

Running unit tests against package metadata heuristics: make test-metadata-rules (tests are here).

Adding new source code heuristics

TBD

Adding new package metadata heuristics

TBD

Acknowledgments

Authors:

Inspiration:

About

🐍 🔍 GuardDog is a CLI tool to Identify malicious PyPI and npm packages

Resources

License

Security policy

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 77.3%
  • Jupyter Notebook 19.4%
  • Shell 1.4%
  • Other 1.9%