This is a sample of the DataFlow pipeline that detects and masks US Social Security Numbers from the log payloads at ingestion time. The folder contains the following files:
- Dockerfile to use for a customized a DataFlow job container in order to save initialization time
- Boilerplate code that implements a pipeline for streaming log entries from PubSub to a destination Log bucket
- Final version of the pipeline that includes all modifications to the boilerplate code that are required to implement log redaction
- Requirements file to be install the DataFlow job's environment with missing component(s)
If you have a Google Cloud account and an access to a GCP project you can launch an interactive tutorial in Cloud Console and see how the sample works. To run the tutorial press the button below.
NOTE: To run this tutorial you will need to have permissions to enable Google APIs, provision PubSub, DataFlow and Cloud Storage resources as well as permissions to call DLP API