Thanks for your interest in helping to grow this repository, and make it better for developers everywhere! This document serves as a guide to help you quickly gain familarity with the repository, and start your development environment so that you can quickly hit the ground running.
/detect_secrets # This is where the main code lives
/core # Powers the detect-secrets engine
/plugins # All plugins live here, modularized.
/common # Common logic shared between plugins
main.py # Entrypoint for console use
pre_commit_hook.py # Entrypoint for pre-commit hook
/test_data # Sample files used for testing purposes
/testing # Common logic used in test cases
/tests # Mirrors detect_secrets layout for all tests
There are several ways to spin up your virtual environment:
virtualenv --python=python3 venv
source venv/bin/activate
pip install -r requirements-dev.txt
or
python3 -m venv venv
source venv/bin/activate
pip install -r requirements-dev.txt
or
tox -e venv
source venv/bin/activate
Whichever way you choose, you can check to see whether you're successful by executing:
PYTHONPATH=`pwd` python detect_secrets/main.py --version
There are many examples of existing plugins to reference, under
detect_secrets/plugins
. However, this is the overall workflow:
-
Write your tests
Before you write your plugin, you should know what it intends to do: what it should catch, and arguably more importantly, what it should avoid. Formalize these examples in tests!
For a basic example, see
tests/plugins/basic_auth_test.py
. -
Write your plugin
All plugins MUST inherit from
detect_secrets.plugins.base.BasePlugin
. See that class' docstrings for more detailed information.Depending on the complexity of your plugin, you may be able to inherit from
detect_secrets.plugins.base.RegexBasedDetector
instead. This is useful if you want to merely customize a new regex rule. Check outdetect_secrets/plugins/basic_auth.py
for a good example of this.Be sure to write comments about why your particular regex was crafted as it is!
-
Update documentation
Be sure to add your changes to the
README.md
andCHANGELOG.md
so that it will be easier for maintainers to bump the version and for other downstream consumers to get the latest information about plugins available.
- There should be a total of three modified files in a minimal new plugin: the plugin file, it's corresponding test, and an updated README.
- If your plugin uses customizable options (e.g. entropy limit in
HighEntropyStrings
) be sure to add default options to the plugin'sdefault_options
.
You can run the test suite in the interpreter of your choice (in this example,
py35
) by doing:
tox -e py35
For a list of supported interpreters, check out envlist
in tox.ini
.
If you wanted to run all interpreters (might take a while), you can also just run:
make test
With pytest
, you can specify tests you want to run in multiple granularity
levels. Here are a couple of examples:
-
Running all tests related to
core/baseline.py
pytest tests/core/baseline_test.py
-
Running a single test class
pytest tests/core/baseline_test.py::TestInitializeBaseline
-
Running a single test function, inside test class
pytest tests/core/baseline_test.py::TestInitializeBaseline::test_basic_usage
-
Running a single root level test function
pytest tests/plugins/base_test.py::test_fails_if_no_secret_type_defined
This lives at the very heart of the engine, and represents a line being flagged for its potential to be a secret.
Since the detect-secrets engine is heuristics-based, it requires a human to read its output at some point to determine false/true positives. Therefore, its representation is tailored to support high readability. Its attributes represent values that you would want to know (and keep track of) for each potential secret, including:
- What is it?
- How was it found?
- Where is it found?
- Is it a true/false positive?
We can see that the JSON dump clearly shows this.
{
"type": "Base64 High Entropy String",
"filename": "test_data/config.yaml",
"line_number": 5,
"hashed_secret": "bc9160bc0ff062e1b2d21d2e59f6ebaba104f051",
"is_secret": false
}
However, since it is designed for easy reading, we didn't want the baseline to be the single file that contained all the secrets in a given repository. Therefore, we mask the secret by hashing it with three core attributes:
- The actual secret
- The filepath where it was found
- How the engine determined it was a secret
Any potential secret that has all three values the same is equal.
This means that the engine will flag the following cases as separate occurrences to investigate:
- Same secret value, but present in different files
- Same secret value, caught by multiple plugins
Furthermore, this will not flag on every single usage of a given secret in a given file, to minimize noise.
Important Note: The line number does not play a part in the identification
of a potential secret because code is expected to move around through continuous
iteration. However, through the audit
tool, these line numbers are leveraged
to quickly identify the secret that was identified by a given plugin.
A collection of PotentialSecrets
are stored in a SecretsCollection
. This
contains a list of all the secrets in a given repository, as well as any other
details needed to recreate it.
A formatted dump of a SecretsCollection
is used as the baseline file.
In this way, the overall baseline logic is simple:
- Scan the repository to create a collection of known secrets.
- Check every new secret against this collection of known secrets.
- If you previously didn't know about it, alert off it.
With this in mind, this class exposes three types of methods:
We need to create a SecretsCollection
object from a formatted baseline output,
so that we can compare new secrets against it. This means that the baseline
must include all information needed to initialize a SecretsCollection
,
such as:
- Secrets found,
- Files to exclude,
- Plugin configurations,
- Version of detect-secrets used
Once we have a collection of secrets, we can add secrets to it via various
methods of scanning strings. The various methods of scanning strings (e.g.
scan_file
, scan_diff
) should handle iterating through all plugins, and
adding results found to the collection.
We need to be able to create a baseline from a SecretsCollection, so that it
can be used for future comparisons. In the same spirit as the PotentialSecret
object, it is designed for high readability, and may contain other metadata
that aids human analysis of the generated output (e.g. generated_at
time).