Skip to content
This repository has been archived by the owner on Jul 12, 2022. It is now read-only.

Commit

Permalink
updating CONTRIBUTING.md
Browse files Browse the repository at this point in the history
Aaron Loo committed May 29, 2019

Verified

This commit was signed with the committer’s verified signature.
Duhemm Martin Duhem
1 parent 5820fa7 commit f8682ab
Showing 1 changed file with 96 additions and 0 deletions.
96 changes: 96 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -143,3 +143,99 @@ levels. Here are a couple of examples:
```bash
pytest tests/plugins/base_test.py::test_fails_if_no_secret_type_defined
```

## Technical Details

### PotentialSecret

This lives at the very heart of the engine, and represents a line being flagged
for its potential to be a secret.

Since the detect-secrets engine is heuristics-based, it requires a human to read
its output at some point to determine false/true positives. Therefore, its
representation is tailored to support **high readability**. Its attributes
represent values that you would want to know (and keep track of) for
each potential secret, including:

1. What is it?
2. How was it found?
3. Where is it found?
4. Is it a true/false positive?

We can see that the JSON dump clearly shows this.

```
{
"type": "Base64 High Entropy String",
"filename": "test_data/config.yaml",
"line_number": 5,
"hashed_secret": "bc9160bc0ff062e1b2d21d2e59f6ebaba104f051",
"is_secret": false
}
```

However, since it is designed for easy reading, we didn't want the baseline to
be the single file that contained all the secrets in a given repository.
Therefore, we mask the secret by hashing it with three core attributes:

1. The actual secret
2. The filepath where it was found
3. How the engine determined it was a secret

Any potential secret that has **all three values the same is equal**.

This means that the engine will flag the following cases as separate occurrences
to investigate:

* Same secret value, but present in different files
* Same secret value, caught by multiple plugins

Furthermore, this will **not** flag on every single usage of a given secret in a
given file, to minimize noise.

**Important Note:** The line number does not play a part in the identification
of a potential secret because code is expected to move around through continuous
iteration. However, through the `audit` tool, these line numbers are leveraged
to quickly identify the secret that was identified by a given plugin.

### SecretsCollection

A collection of `PotentialSecrets` are stored in a `SecretsCollection`. This
contains a list of all the secrets in a given repository, as well as any other
details needed to recreate it.

A formatted dump of a `SecretsCollection` is used as the baseline file.

In this way, the overall baseline logic is simple:

1. Scan the repository to create a collection of known secrets.
2. Check every new secret against this collection of known secrets.
3. If you previously didn't know about it, alert off it.

With this in mind, this class exposes three types of methods:

##### 1. Creating

We need to create a `SecretsCollection` object from a formatted baseline output,
so that we can compare new secrets against it. This means that the baseline
**must** include all information needed to initialize a `SecretsCollection`,
such as:

* Secrets found,
* Files to exclude,
* Plugin configurations,
* Version of detect-secrets used

##### 2. Adding

Once we have a collection of secrets, we can add secrets to it via various
methods of scanning strings. The various methods of scanning strings (e.g.
`scan_file`, `scan_diff`) should handle iterating through all plugins, and
adding results found to the collection.

##### 3. Outputting

We need to be able to create a baseline from a SecretsCollection, so that it
can be used for future comparisons. In the same spirit as the `PotentialSecret`
object, it is designed for **high readability**, and may contain other metadata
that aids human analysis of the generated output (e.g. `generated_at` time).

0 comments on commit f8682ab

Please sign in to comment.