Table of Contents
This module is intended to spin up a bare bones data lake for demos and testing Kerberos integration with other services (e.g. airflow or dataflow). This is not meant for production use.
This includes:
- Multi-tenant Hadoop Cluster w/ Hive / Spark / Presto (Dataproc)
- kerberos (MIT KDC)
- hive metastore (Dataproc cluster on server perhaps DPMS in the future)
KMS keys cannot be deleted and this module will choke on trying to destory KMS keys or key rings. The workaround is to remove the key from terraform state.
terragrunt state rm module.test_data_lake.module.kms.google_kms_crypto_key.key_ephemeral[0]
Then on re-applies use a different keyring name. You should also taint your Dataproc clusters and the encrypted principals null resource so they get re-created on the next apply with the new secrets encrypted with the new KMS key.
terragrunt taint module.test_data_lake.null_rescource.encrypted_principals
terragrunt taint module.test_data_lake.google_dataproc_cluster.kdc_cluster
terragrunt taint module.test_data_lake.google_dataproc_cluster.metastore_cluster
terragrunt taint module.test_data_lake.google_dataproc_cluster.analytics_cluster
Name | Version |
---|---|
terraform | >= 0.12.17 |
>= 3.38.0, < 3.41.0 |
Name | Version |
---|---|
>= 3.38.0, < 3.41.0 | |
google-beta | n/a |
null | n/a |
Name | Description | Type | Default | Required |
---|---|---|---|---|
analytics_cluster | name for analytics dataproc cluster | string |
"analytics-cluster" |
no |
analytics_realm | Kerberos realm for analytics clusters to use | string |
"ANALYTICS.FOO.COM" |
no |
corp_kdc_realm | Kerberos realm to represent centralized kerberos identities | string |
"FOO.COM" |
no |
data_lake_super_admin | User email for super admin rights on data lake | any |
n/a | yes |
dataproc_kms_key | Name for KMS Key for kerberized dataproc | string |
"dataproc-key" |
no |
dataproc_subnet | self link for VPC subnet in which to spin up dataproc clusters | any |
n/a | yes |
kdc_cluster | name for kdc dataproc cluster | string |
"kdc-cluster" |
no |
kms_key_ring | Name for KMS Keyring | string |
"dataproc-kerberos-keyring" |
no |
metastore_cluster | name for Hive Metastore dataproc cluster | string |
"metastore-cluster" |
no |
metastore_realm | Kerberos realm for hive metastore to use | string |
"HIVE-METASTORE.FOO.COM" |
no |
project | GCP Project ID in which to deploy data lake resources | any |
n/a | yes |
region | GCP Compute region in which to deploy dataproc clusters | string |
"us-central1" |
no |
tenants | list of non-human kerberos principals (one per tenant) to be created as unix users on each cluster | list(string) |
[ |
no |
users | list of human kerberos principals to be created as unix users on each cluster | list(string) |
[ |
no |
zone | GCP Compute region in which to deploy dataproc clusters | string |
"us-central1-f" |
no |
Name | Description |
---|---|
analytics_cluster_fqdn | Fully qualified domain name for cluster on which to run presto / spark jobs |
gcs_encrypted_keytab_path | GCS path to keep keytabs |
kms_key | kms key for decrypting keytabs |