GuardDog is a CLI tool that allows to identify malicious PyPI packages. It runs a set of heuristics on the package source code (through Semgrep rules) and on the package metadata.
GuardDog can be used to scan local or remote PyPI packages using any of the available heuristics.
pip install guarddog
Or use the Docker image:
docker pull ghcr.io/datadog/guarddog
alias guarddog='docker run --rm ghcr.io/datadog/guarddog'
# Scan the most recent version of the 'requests' package
guarddog pypi scan requests
# Scan a specific version of the 'requests' package
guarddog pypi scan requests --version 2.28.1
# Scan the 'request' package using 2 specific heuristics
guarddog pypi scan requests --rules exec-base64 --rules code-execution
# Scan the 'requests' package using all rules but one
guarddog pypi scan requests --exclude-rules exec-base64
# Scan a local package
guarddog pypi scan /tmp/triage.tar.gz
# Scan every package referenced in a requirements.txt file of a local folder
guarddog pypi verify workspace/guarddog/requirements.txt
# Output JSON to standard output - works for every command
guarddog pypi scan requests --output-format=json
# All the commands also work on npm
guarddog npm scan express
GuardDog comes with 2 types of heuristics:
-
Source code heuristics: Semgrep rules running against the package source code.
-
Package metadata heuristics: Python heuristics running against the package metadata on PyPI.
Heuristic | Description |
---|---|
Command overwrite | The install command is overwritten in the setup.py file, indicating that a system command is automatically run when installing the package through pip install . |
Dynamic execution of base64-encoded data | A base64-encoded string ends up being executed by a function like exec or eval |
Download of an executable to disk | Data coming from an HTTP response ends up being written to disk and made executable |
Exfiltration of sensitive data to a remote server | Sensitive data from the environment ends up being sent through an HTTP request |
Code execution in setup.py |
Code in setup.py executes code dynamically or starts a new process |
Unusual domain extension | Usage of a domain name with an extension frequently used by malware (e.g. .xyz or .top ) |
Dynamic execution of hidden data from an image | The package uses steganography to extract a payload from an image and execute it |
Use of a common obfuscation method | The package uses an obfuscation method commonly used by malware, such as running eval on hexadecimal strings |
Heuristic | Description |
---|---|
Typosquatting | Package has a name close to one of the top 5k PyPI packages |
Potentially compromised maintainer e-mail domain | Maintainer e-mail address is associated to a domain that was re-registered later than the last package release. This can be an indicator that this is a custom domain that expired, and was leveraged by an attacker to compromise the package owner's PyPI account. See here for a description of the issue for npm. |
Empty package description | Package has an empty description of PyPI |
Release 0.0.0 | Package has its latest release set to 0.0.0 or 0.0 |
- Ensure
>=python3.10
is installed - Clone the repository
- Create a virtualenv:
python3 -m venv venv && source venv/bin/activate
- Install requirements:
pip install -r requirements.txt
- Run GuardDog using
python -m guarddog
- Ensure poetry has an env with
python >=3.10
poetry env use 3.10.0
- Install dependencies
poetry install
- Run guarddog
poetry run guarddog
orpoetry shell
then runguarddog
Running all unit tests: make test
Running unit tests against Semgrep rules: make test-semgrep-rules
(tests are here). These use the standard methodology for testing Semgrep rules.
Running unit tests against package metadata heuristics: make test-metadata-rules
(tests are here).
Type checking:
pip install mypy
make type-check
Linting:
pip install flake8
make lint
You can also use pre-commit hooks. Install them once using:
pip install pre-commit
pre-commit install
This will cause make lint
and make type-check
to automatically run before each of your commits, failing early if your code has an issue that would fail on CI.
TBD
TBD
Authors:
Inspiration: