Skip to content

Kirhog/gcp-token-broker

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

72 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GCP Token Broker

Notice: This is an alpha release of the GCP Token Broker. This project might be changed in backward-incompatible ways and is not subject to any SLA or deprecation policy.

Table of contents:

About this project

The GCP Token Broker enables end-to-end Kerberos security and Cloud IAM integration for Hadoop workloads on Google Cloud Platform (GCP).

This project aims to achieve the following goals:

  • Bridge the gap between Kerberos and Cloud IAM to allow users to log in with Kerberos and access GCP resources.
  • Enable multi-tenancy for Hadoop clusters on Compute Engine and Cloud Dataproc.
  • Enable user impersonation by Hadoop services such as Hive, Presto, or Oozie.

This project also strives to address the following requirements, which many enterprise customers have when they're looking to migrate on-premise workloads to the cloud:

  • All access to GCP resources (Cloud Storage, Google BigQuery, Cloud Bigtable, etc) should be attributable to the individual users who initiated the requests.
  • No long-lived credentials should be stored on client machines or worker nodes.
  • Cause as few changes as possible to existing on-premise security systems and user workflows.

Repository's contents

This repository contains:

  • apps: Server applications, including:
    • authorizer: Web UI for the OAuth flow that users must go through to authorize the broker service.
    • broker: The broker service itself.
  • deploy: Helm charts for deploying the broker service and the authorizer app to a Kubernetes cluster.
  • connector: Extension for the GCS Connector to allow Hadoop to communicate with the broker.
  • init-action: Initialization action to install the broker dependencies in a Cloud Dataproc cluster.
  • load-testing: Scripts for running loads tests for the broker service.
  • terraform: Terraform scripts to deploy a sample demo environment. This is provided only as a reference and should not be used as-is in production.

Roadmap

Included in the current alpha release:

  • Full lifecycle of Hadoop-style delegation tokens: creation, renewal, cancellation.
  • Support for Hadoop-style proxy users.
  • Authentication backend: Kerberos.
  • Target GCP service: Cloud Storage.
  • Database backends: Cloud Datastore, JDBC.
  • Cache backend: Redis on Cloud Memorystore.

Plans for the beta & stable releases:

  • Performance optimizations.
  • Rewrite of server application from Python to Java.
  • API stabilization.

Plans for future releases:

  • Target GCP services: BigQuery, Cloud Bigtable, Cloud PubSub.
  • Database backends: Cloud Firestore, Cloud Bigtable.
  • Cache backends: Memcached, Cloud Bigtable.
  • Support for more authentication backends: TBD.

Design

The broker's role is two-fold:

  • Map Kerberos identities to Google Cloud Identities.
  • Generate access tokens to enable authenticated Kerberos principals to access GCP resources.

The broker service relies on SPNEGO tokens to authenticate users. A SPNEGO token is basically a service ticket that contains the requester principal's username and that is encrypted with the requested service principal's key. Only the service principal can decrypt that token to reveal the encrypted username.

The broker service does not communicate directly with the KDC to decrypt SPNEGO tokens. Instead, it uses keytabs to do the decryption offline. You can use either one of the following two approaches to manage principals and keytabs for the broker service:

  • Create a "broker/" principal in each realm (for example "broker/@USER_REALM" in your users realm and "broker/@HADOOP_CLUSTER" in your Hadoop cluster's realm). Then create a separate keytab for each one of the broker principals (i.e. one keytab per realm) and upload the keytabs to the broker service.
  • Create a single "broker/" principal in one of your realms — either a separate realm dedicated to the broker, or a realm that already hosts principals for other services. Then set up uni-directional trust from your broker's realm to the realms that host the users who need to access the broker service. Then create a keytab for the broker principal and upload it to the broker service.

Once the broker service has the appropriate keytab(s), it is able to authenticate users.

When a client wants to call the broker service, the GCS Connector calls the KDC associated with the broker's realm. The KDC then creates a new SPNEGO token for the user and encrypts it with the broker principal's key. Once the GCS Connector receives the new SPNEGO token, it sends the request to the broker and passes the SPNEGO token inside a request header.

When the broker receives the request, it uses its available keytab(s) to attempt to decrypt the provided SPNEGO token. If the decryption succeeds, then the broker can retrieve the full username from the encrypted SPNEGO token and trust that it is a legitimate authenticated user.

The following sub-sections describe how the broker enables different modes of authentication: direct authentication, delegated authentication, and proxy-user impersonation.

Direct authentication

With this mode of authentication, a user directly obtains an access token from the broker by authenticating with Kerberos. The broker can authenticate the user by using a Kerberos keytab. This mode of authentication is used for simple use cases, for example to list the contents of a bucket:

hadoop fs -ls gs://[my-bucket]

The following diagram illustrates the overall architecture for direct authentication:

The following sequence diagram describes the workflow for direct authentication:

Delegated authentication

Delegated authentication is performed in more complex use cases that involve running a job across multiple distributed tasks. For example:

hadoop jar wordcount.jar wordcount gs://[input-bucket]/data.txt gs://[output-bucket]

The broker's delegated authentication mechanism plugs into Hadoop's standard delegation token facilities. This allows the broker to be compatible with many tools in the Hadoop ecosystem like Yarn, Spark, or Hive, without any changes required for those tools.

The following diagram illustrates the overall architecture for delegated authentication:

The following sequence diagram describes the workflow for delegated authentication:

The renewal process is handled automatically by Yarn. Here is how it works:

  1. The client calls the new GCS Connector extension's FileSystem.getDelegationToken method, which instantiates a TokenIdentifier, which calls the broker to obtain a new delegation token.
  2. The client passes the delegation token to the job application context. The delegation token is also securely stored in the UserGroupInformation currentUser's credentials.
  3. The Yarn Resource Manager (YRM) calls the GCS Connector extension's TokenRenewer.renew method, which calls the broker to extend the delegation token's lifetime (configurable, defaults to 1 day). The YRM does this to verify that the "yarn" user is authorized to renew the delegation token and to catch potential issues early before submitting the job.
  4. The job is submitted and the tasks start running.
  5. When a task needs to access GCS, it calls the GCS Connector extension's AccessTokenProvider.refresh method, which calls the broker to get a new access token, which is valid for 1 hour. The task then uses that access token to access GCS.
  6. When the access token expires (after 1 hour), the next request to GCS fails. The GCS Connector detects the failure, calls the broker again and trades the delegation token for a new access token, then retries the GCS request with the new access token. The task's work then resumes as normal.
  7. When a delegation token is at ~90% of its lifetime, the YRM calls the TokenRenewer.renew method again to extend the delegation token lifetime by another renewal period. This ensures that the delegation token used by the tasks remains valid for the entire duration of the job.
  8. At the end of the job, the YRM calls the GCS Connector extension's TokenRenewer.cancel method, which calls the broker to cancel the delegation token. At that point, the delegation token is rendered unusable.

While the job is running, the broker keeps some details in its database about the session associated with the delegation token:

  • id: Automatically generated unique ID for the session.
  • creation_time: Time at which the delegation token was created, that is, just before the job started.
  • expires_at: Time at which the delegation token will expire. If the Yarn Resource Manager renews the token, then this time will be extended to some time in the future (24 hours later, by default).
  • owner: Name of the original Kerberos principal who submitted the job.
  • renewer: Name of the Kerberos principal who is authorized to renew and cancel the token.
  • scope: Google API scope used to generate access tokens.
  • target: Name of the bucket associated with the delegation token.
  • password: Secret associated with the session. Used to validate that the delegation token provided by the client was in fact created by the broker.

Proxy user impersonation

Principals for applications like Hive, Oozie, or Presto are typically configured as proxy users (also sometimes referred to as "super users"). The proxy user privilege allows those services to impersonate other kerberized users, similarly to how the root user can impersonate any other users on Linux. This allows, for example, Hive to execute the UserGroupInformation:doAs() method to read files from HDFS on behalf of the user who is running a Hive query.

The broker can handle proxy users. For that, you just need to specify the list of authorized proxy users in the broker's settings, for example:

PROXY_USER_WHITELIST='hive/[email protected],oozie/[email protected]'

When a Hive job is running, the GCS connector calls the broker and sends a SPNEGO token for the logged-in principal (e.g. "hive") in the request authentication header and the currentUser's username (e.g. "alice") as a request parameter. The broker then checks that the SPNEGO token's Kerberos principal is in the PROXY_USER_WHITELIST setting, and if so, returns a GCP access token for the "[email protected]" Cloud Identity.

Creating a demo GCP environment

This section describes how to set up a demo environment to test the broker service.

Important note: The following instructions are provided only as a reference to create the demo environment and should not be used as-is in production.

The demo environment will help you understand and test the broker's functionality in a practical way. Once you become familiar with the broker's codebase and functionality, you can look to adapt the demo to your production or staging environments (For more details on that, refer to the "Production considerations" section).

The following diagram illustrates the demo environment's architecture:

Notes about the architecture:

  • The broker's server application is implemented in Python and is deployed with Kubernetes on Kubernetes Engine. The broker's Kubernetes spec automatically deploys an internal load balancer to balance traffic across the broker server pods.
  • Interactions between clients and the broker is done with gRPC and protocol buffers.
  • The Origin KDC is the source of truth for Kerberos users. Alternatively it could be replaced with an Active Directory or Open LDAP instance.
  • All machines for the broker, KDC, and clients are deployed in private networks with RFC 1918 IP addresses. Cloud NAT gateways are deployed for cases where machines need to access the internet. Private connectivity between Hadoop client machines (e.g. on Compute Engine or Cloud Dataproc), the broker service, and the Origin KDC is established over VPC peering. Alternatively, private connectivity could also be established with Cloud VPN or Shared VPC. Google Private Access is enabled on the VPC subnets to allow machines to access Google services like Cloud Datastore, Cloud Storage or Cloud KMS.

Prerequisites

Before you start, you must set up some prerequisites for the demo:

  1. Register a new domain name with your preferred domain registrar. This is recommended so you can test in a self-contained environment.
  2. Create a new GSuite organization associated with the new domain name.
  3. Create 3 new non-admin users in the organization (e.g. "alice", "bob", and "john").
  4. Create a new GCP project under the GSuite organization and enable billing.
  5. Install some tools on your local machine (The versions indicated below are the ones that have been officially tested. Newer versions might work but are untested):

Deploying the demo architecture

The demo enviromnent is comprised of multiple components and GCP products (Cloud Datastore, Cloud Memorystore, VPCs, firewall rules, etc.), which are automatically deployed using terraform.

Follow these steps to deploy the demo environment to GCP:

  1. Log in as the Google user who owns the GCP project:

    gcloud auth application-default login
  2. Run the following commands to set some default configuration values for gcloud. Replace [your-project-id] with your GCP project ID, and [your-zone-of-choice] with your preferred zone (See list of availables zones):

    gcloud config set project [your-project-id]
    gcloud config set compute/zone [your-zone-of-choice]
  3. Change into the terraform directory:

    cd terraform
  4. Create a terraform.tfvars file in the terraform directory with the following configuration (Update the values as needed. Also make sure to use the same gcp_zone as you selected in the above step, and its corresponding gcp_region):

    gcp_project = "[your-project-id]"
    gcp_region = "us-west1"
    gcp_zone = "us-west1-a"
    datastore_region = "us-west2"
    domain = "[your.domain.name]"
    authorizer_hostname = "[your.authorizer.hostname]"
    origin_realm = "[YOUR.REALM.NAME]"
    test_users = ["alice", "bob", "john"]
    

    Notes:

    • domain is the domain name (e.g. "your-domain.com") that you registered in the Prerequisites section for your GSuite organization.
    • datastore_region is the region for your Cloud Datastore database. See the list of available regions for Cloud Datastore.
    • authorizer_hostname is the host name (e.g. "authorizer.your-domain.com") that you wish to use to access the authorizer app. This value will be used to configure the authorizer app's load balancer.
    • origin_realm is the Kerberos realm (e.g. "YOUR-DOMAIN.COM") that you wish to use for your test users. This value can be a totally arbitrary string, and is generally made of UPPERCASE letters.
    • Replace the test_users with the usernames of the three users that you created in the Prerequisites section.
  5. Run: terraform init

  6. Run: terraform apply

Configuring the OAuth client

  1. Add an A DNS record in your domain registrar for your authorizer app's domain name. For the A record's IP address, use the IP returned by the following command:

    gcloud compute addresses describe authorizer-ip --global --format="value(address)"
  2. Create an OAuth consent screen:

    • Go to: https://console.cloud.google.com/apis/credentials/consent
    • For "Application type", select "Internal".
    • For "Application name", type "GCP Token Broker".
    • For "Scopes for Google APIs", click "Add scope", then search for "Google Cloud Storage JSON API", then tick the checkbox for "auth/devstorage.read_write", then click "Add".
    • For "Authorized domains", type your domain name then press Enter on your keyboard to add it to the list.
    • Click "Save".
  3. Create a new OAuth client ID:

    • Go to: https://console.cloud.google.com/apis/credentials
    • Click "Create credentials" > "OAuth client ID"
    • For "Application type", select "Web application".
    • For "Name", type "GCP Token Broker".
    • Leave "Authorized JavaScript origins" blank.
    • For "Authorized redirect URIs":
      • Type the following (Replace [your.authorizer.hostname] with your authorizer app's host name): https://[your.authorizer.hostname]/google/auth
      • Press "Enter" on your keyboard to add the URI to the list.
    • Click "Create".
    • Click "Ok" to close the confirmation popup.
    • Click the "Download JSON" icon for your client ID.
    • Move the downloaded JSON file to the code repository's root, then rename it to client_secret.json.

Enabling audit logs for GCS

Follow these steps to enable GCS audit logs:

  1. Go to: https://console.cloud.google.com/iam-admin/audit
  2. In the "Filter table" text box, type "Google Cloud Storage" then press the "Enter" key.
  3. Click on the "Google Cloud Storage" entry.
  4. Tick the 3 checkboxes: "Admin Read", "Data Read", "Data Write".
  5. Click "Save".

Creating TLS certificates

The broker and authorizer apps both use TLS encryption when serving requests.

You may choose to use your own domain, certificates, and trusted Certificate Authority. Alternatively, for development and testing purposes only, you may create self-signed certificates as described below.

Run from the following commands from the root of the repository:

  • Create broker certificate:

    BROKER_DOMAIN="10.2.1.255.xip.io"
    openssl genrsa -out broker-tls.key 2048
    openssl req -new -key broker-tls.key -out broker-tls.csr -subj "/CN=${BROKER_DOMAIN}"
    openssl x509 -req -days 365 -in broker-tls.csr -signkey broker-tls.key -out broker-tls.crt
    openssl pkcs8 -topk8 -nocrypt -in broker-tls.key -out broker-tls.pem
  • Create authorizer certificate (Replace [your.authorizer.hostname] with your authorizer app's host name):

    AUTHORIZER_DOMAIN="[your.authorizer.hostname]"
    openssl genrsa -out authorizer-tls.key 2048
    openssl req -new -key authorizer-tls.key -out authorizer-tls.csr -subj "/CN=${AUTHORIZER_DOMAIN}"
    openssl x509 -req -days 365 -in authorizer-tls.csr -signkey authorizer-tls.key -out authorizer-tls.crt

Deploying the broker service

To deploy the broker service, run the following commands from the root of the repository:

  1. Download the broker app's JAR:

    export BROKER_VERSION=$(cat VERSION)
    mkdir -p apps/broker/target
    curl https://repo1.maven.org/maven2/com/google/cloud/broker/broker/$BROKER_VERSION/broker-$BROKER_VERSION-jar-with-dependencies.jar > apps/broker/target/broker-$BROKER_VERSION-jar-with-dependencies.jar
    
  2. Set some environment variables:

    export PROJECT=$(gcloud info --format='value(config.project)')
    export ZONE=$(gcloud info --format='value(config.properties.compute.zone)')
  3. Configure credentials for the cluster:

    gcloud container clusters get-credentials broker
  4. Create a Kubernetes service account with the cluster admin role for Tiller, the Helm server:

    kubectl create serviceaccount --namespace kube-system tiller
    kubectl create clusterrolebinding tiller --clusterrole=cluster-admin --serviceaccount=kube-system:tiller
  5. Install Helm tiller in the cluster:

    helm init --service-account tiller
  6. Create the Broker secrets:

    kubectl create secret generic broker-secrets \
      --from-file=client_secret.json \
      --from-file=tls.pem=broker-tls.pem \
      --from-file=tls.crt=broker-tls.crt
  7. Create the Authorizer secrets

    openssl rand -base64 32 > authorizer-flask-secret.key
    
    kubectl create secret generic authorizer-secrets \
      --from-file=client_secret.json \
      --from-file=authorizer-flask-secret.key \
      --from-file=tls.key=authorizer-tls.key \
      --from-file=tls.crt=authorizer-tls.crt
  8. Create the skaffold.yaml configuration file:

    cd deploy
    sed -e "s/PROJECT/$PROJECT/" skaffold.yaml.template > skaffold.yaml
  9. Deploy to Kubernetes Engine:

skaffold dev -v info

Note: The first time you run the skaffold command, it might take a few minutes for the container images to build and get uploaded to the container registry.

  1. Wait until an external IP has been assigned to the broker service. You can check the status by running the following command in a different terminal, and by looking up the EXTERNAL-IP value:
kubectl get service broker-service

Using the authorizer

The Authorizer is a simple Web UI that users must use, only once, to authorize the broker. The authorization process consists of a simple OAuth flow:

  1. Open the authorizer page in your browser (https://[your.authorizer.hostname]).

    Notes:

    • If you're trying to access the authorizer page right after deploying the authorizer app with the skaffold command, your browser might return an error with a 502 code when loading the authorizer page. This means that the load balancer is still being deployed. It might take a few minutes for this deployment to complete. Wait for a few seconds, and then refresh the page. Try this until the page works and the authorizer UI appears.
    • If you used a self-signed certificate for the authorizer app, the browser will display a warning (In Chrome, you see a message that says "Your connection is not private"). You can ignore this warning and proceed to loading the page (In Chrome, click the "Advanced" button then click the "Proceed" link).
  2. Click "Authorize". You are redirected to the Google login page.

  3. Enter the credentials for one of the three users you created in the Prerequisites section.

  4. Read the consent form, then click "Allow". You are redirected back to the authorizer page, and are greeted with a "Success" message. The broker now has authority to generate GCP access tokens on the user's behalf.

Creating a Dataproc cluster

In this section, you create a Dataproc cluster to run sample Hadoop jobs and interact with the broker.

Run the following commands from the root of the repository:

  1. Set an environment variable for the Kerberos realm (Replace [ORIGIN.REALM.COM] with the same Kerberos realm you used in the terraform.tfvars file):

    export REALM=[ORIGIN.REALM.COM]
  2. Set a few more environment variables:

    export PROJECT=$(gcloud info --format='value(config.project)')
    export ZONE=$(gcloud info --format='value(config.properties.compute.zone)')
    export REGION=${ZONE%-*}
    export ORIGIN_KDC_HOSTNAME=$(gcloud compute instances describe origin-kdc --format="value(networkInterfaces[0].networkIP)").xip.io
    export BROKER_SERVICE_HOSTNAME="10.2.1.255.xip.io"
    export BROKER_VERSION=$(cat VERSION)
  3. Create the Kerberos configuration file for Dataproc:

    cat > kerberos-config.yaml << EOL
    root_principal_password_uri: gs://${PROJECT}-secrets/root-password.encrypted
    kms_key_uri: projects/$PROJECT/locations/$REGION/keyRings/dataproc-key-ring/cryptoKeys/dataproc-key
    cross_realm_trust:
      kdc: $ORIGIN_KDC_HOSTNAME
      realm: $REALM
      shared_password_uri: gs://$PROJECT-secrets/shared-password.encrypted
    EOL
  4. Create the Dataproc cluster:

    gcloud beta dataproc clusters create test-cluster \
      --single-node \
      --no-address \
      --zone $ZONE \
      --subnet client-subnet \
      --image-version 1.4 \
      --bucket ${PROJECT}-staging \
      --scopes cloud-platform \
      --service-account "dataproc@${PROJECT}.iam.gserviceaccount.com" \
      --initialization-actions gs://gcp-token-broker/broker-connector.${BROKER_VERSION}.sh \
      --kerberos-config-file=kerberos-config.yaml \
      --metadata "gcp-token-broker-tls-enabled=true" \
      --metadata "gcp-token-broker-tls-certificate=$(cat broker-tls.crt)" \
      --metadata "gcp-token-broker-uri-hostname=$BROKER_SERVICE_HOSTNAME" \
      --metadata "gcp-token-broker-uri-port=443" \
      --metadata "origin-realm=$REALM"

Uploading keytabs

The broker service needs a keytab to authenticate incoming requests.

  1. Download the keytab from the Dataproc cluster's realm:

    gcloud beta compute ssh test-cluster-m \
      --tunnel-through-iap \
      -- "sudo cat /etc/security/keytab/broker.keytab" | perl -pne 's/\r$//g' > broker.keytab
    
  2. Upload the keytab to the broker cluster:

    kubectl create secret generic broker-keytabs \
      --from-file=broker.keytab
  3. Restart the broker Kubernetes pods:

    helm upgrade --recreate-pods -f deploy/values_override.yaml broker deploy/broker
  4. You are now ready to do some testing. Refer to the Test scenarios section to run some sample Hadoop jobs and try out the broker's functionality.

Production considerations

This section describes some tips to further improve the deployment process, performance, and security in production environments.

Deployment process

Configuration management

Note that the Terraform scripts provided in this repository are provided only as a reference to set up a demo environment. You can use those scripts as a starting point to create your own scripts for Terraform or your preferred configuration management tool to deploy the broker service to production or staging environment.

Building containers

The demo uses Skaffold to build and deploy the applications containers. Note that Skaffold is mainly suitable for development purposes. In production, you should build and deploy the application containers directly with docker.

To build the containers:

# Broker service
docker build -f ./apps/broker/Dockerfile -t gcr.io/$PROJECT/broker .
docker push gcr.io/$PROJECT/broker

# Authorizer
docker build -f ./apps/authorizer/Dockerfile -t gcr.io/$PROJECT/authorizer .
docker push gcr.io/$PROJECT/authorizer

To deploy with Helm and Kubernetes:

helm install -f deploy/values_override.yaml --name broker deploy/broker
helm install -f deploy/values_override.yaml --name authorizer deploy/authorizer

To delete the deployments:

helm delete broker --purge
helm delete authorizer --purge

Performance optimizations

Scaling out workers

Kubernetes Engine is a great platform for running the broker server application. This allows you to scale out the number of workers in multiple, combinable ways:

  • Number of deployed Kubernetes nodes, that is the number of VMs in the Kubernetes cluster, by resizing the cluster:

    gcloud container clusters resize broker --size <NEW_NUMBER_OF_NODES>
  • Number of Kubernetes pods, that is the number of running broker containers in the Kubernetes cluster, by increasing the broker.replicaCount value for the Helm chart and then running the following command:

    helm upgrade -f <VALUE_FILE.yaml> broker deploy/broker
  • Number of threads, i.e. the number of gRPC server instances running in each container, by changing the NUM_SERVER_THREADS broker setting.

You can also scale up each Kubernetes node by increasing memory and CPU resources to accommodate for more workers.

Caching

The broker application caches access tokens to account for the case where hundreds or thousands of tasks might request an access token for the same user at the same time, for example at the beginning of a large Map/Reduce job. This way, all tasks accessing the broker within the cache lifetime window will share the same token. This allows to increase performance and to reduce load on the Google token API.

The broker application uses two types of caching:

  • Remote caching: When a pod generates a new access token for a user, it encrypts the token with Cloud KMS, and caches the encrypted value in Redis on Cloud Memorystore for a short period of time, controlled by a setting (ACCESS_TOKEN_REMOTE_CACHE_TIME, defaults to 30 seconds).
  • Local caching: When a pod obtains an access token for a user (either after generating it or pulling it from the remote cache), it caches the token unencrypted in its local memory for a short period of time, also controlled by a setting (ACCESS_TOKEN_LOCAL_CACHE_TIME, defaults to 60 seconds).

The two settings can be adjusted to tune up performance depending on the profile of the Hadoop cluster's workloads.

You can also select a different remote cache backend (e.g. Memcached) with the REMOTE_CACHE setting.

Scalable database

To do its work, the broker needs to store some state, most notably refresh tokens and broker session details. Cloud Datastore is a great option because of its high scalability and ease of use. For extreme loads, consider using Cloud Bigtable instead for its sub-10ms latency. You can select your preferred database backend with the DATABASE_BACKEND setting.

Security hardening

This section describes different ways to further harden security for the deployment.

Project structure

The broker service has a lot of power as it holds sensitive secrets (e.g. refresh tokens) and has the capacity to generate access tokens for other users. Therefore it is highly recommended to keep the broker and its core components (Kubernetes cluster, cache, database, etc) in a separate project, and to only allow a privileged group of admin users to access its resources. Client machines can be allowed to access the broker service's API through private network connectivity and via Kerberos authentication.

IP-based controls

It is recommended to restrict access to the broker service's API from specific client clusters. This can be done by setting specific IP ranges for the loadBalancerSourceRanges parameter in the broker's Kubernetes configuration.

Access to GCS buckets can also be restricted by IP ranges using VPC Service Controls.

Transport encryption

Both the broker service and the authorizer app encrypt data transport with TLS. It is highly recommended to use a trusted Certificate Authority to issue certificates for those two apps in production.

Storage encryption

The broker encrypts refresh tokens before storing them in its database. The broker also encrypts access tokens before storing them in its cache. Encryption is done using Cloud KMS keys that only privileged admin users and the broker's service account should be given access to.

Test scenarios

This section provides a few simple test scenarios that you can run on the test Dataproc cluster. To SSH in the Dataproc cluster's master node, run this command:

gcloud beta compute ssh test-cluster-m --tunnel-through-iap

Once you're SSH'ed in, log in as one of your test users with Kerberos, for example:

kinit alice@$REALM

Note: For the sake of simplicity for the demo, the Kerberos passwords for the test users were hardcoded in the demo's deployment (See the details in the startup script template and the kadmin.local commands in the origin KDC's terraform specification file. Those hardcoded passwords are the same as the usernames (e.g. the password for "alice@$REALM" is "alice"). Those hardcoded passwords are not the passwords that you would have set for the GSuite users in the Prerequisites section, as that would be the case in a production environment where the KDC's database would be synced with your LDAP database.

Once the Kerberos user is logged-in, you are ready to run the commands in the test scenarios described the following sub-sections. After each command, you can verify in the GCS audit logs that the demo GCS bucket is in fact accessed by the expected GSuite user, that is "[email protected]" (See the Logging section to learn how to iew the logs).

Hadoop FS

Run a simple Hadoop FS command:

hadoop fs -ls gs://$PROJECT-demo-bucket/

This scenario uses the simple direct authentication workflow described earlier, where Hadoop directly requests an access token from the broker to access GCS.

Yarn

Run a simple wordcount job:

hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar wordcount \
  gs://apache-beam-samples/shakespeare/macbeth.txt \
  gs://$PROJECT-demo-bucket/wordcount/output-$(uuidgen)

This scenario uses the delegated authentication workflow, where the Hadoop client first requests a delegation token from the broker, then passes the delegation token to the Yarn workers, which then call the broker again to trade the delegation token for access tokens to access GCS.

Hive

Here are some sample Hive queries you can run using the hive command:

  1. Create a Hive table:

    hive -e "CREATE EXTERNAL TABLE transactions
             (SubmissionDate DATE, TransactionAmount DOUBLE, TransactionType STRING)
             STORED AS PARQUET
             LOCATION 'gs://$PROJECT-demo-bucket/datasets/transactions';"
  2. Run a simple SELECT query:

    hive -e "SELECT * FROM transactions LIMIT 5;"

    This simple query only uses direct authentication.

  3. Run a more complex query with some aggregations:

    hive -e "SELECT TransactionType, AVG(TransactionAmount) AS AverageAmount
             FROM transactions
             WHERE SubmissionDate = '2017-12-22'
             GROUP BY TransactionType;"

    This query is distributed across multiple tasks and therefore uses delegated authentication.

The same Hive queries can also be run using beeline as follows:

beeline -u "jdbc:hive2://localhost:10000/default;principal=hive/$(hostname -f)@$DATAPROC_REALM" \
  -e "CREATE EXTERNAL TABLE transactions
      (SubmissionDate DATE, TransactionAmount DOUBLE, TransactionType STRING)
      STORED AS PARQUET
      LOCATION 'gs://$PROJECT-demo-bucket/datasets/transactions';"

beeline -u "jdbc:hive2://localhost:10000/default;principal=hive/$(hostname -f)@$DATAPROC_REALM" \
  -e "SELECT * FROM transactions LIMIT 5;"

beeline -u "jdbc:hive2://localhost:10000/default;principal=hive/$(hostname -f)@$DATAPROC_REALM" \
  -e "SELECT TransactionType, AVG(TransactionAmount) AS AverageAmount
      FROM transactions
      WHERE SubmissionDate = '2017-12-22'
      GROUP BY TransactionType;"

To try different execution engines (Tez or MapReduce), add either one of the following parameters:

  • For Tez:

    --hiveconf="hive.execution.engine=tez"
  • For Map Reduce:

    --hiveconf="hive.execution.engine=mr"

SparkSQL

Follow these steps to test with SparkSQL:

  1. Stark a Spark Shell session:

    spark-shell --conf "spark.yarn.access.hadoopFileSystems=gs://$PROJECT-demo-bucket"
  2. Run the following Spark code:

    import org.apache.spark.sql.hive.HiveContext
    val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
    hiveContext.sql("SELECT * FROM transactions LIMIT 5").show()
  3. When you've finished your testing, exit the session:

    :q

Simulating the delegation token lifecycle

Hadoop offers different commands to simulate the delegation token lifecycle.

In Hadoop v2.X, which comes pre-installed with Cloud Dataproc, you can use the hdfs fetchdt command to simulate each step in the lifecycle to create, renew, and cancel delegation tokens:

  1. Get a delegation token for user alice:

    hdfs fetchdt -fs gs://$PROJECT-demo-bucket --renewer alice@$REALM ~/my.dt

    The delegation token is now stored in the ~/my.dt file.

  2. Renew the delegation token:

    hdfs fetchdt -fs gs://$PROJECT-demo-bucket --renew ~/my.dt

    The token's lifetime has now been extended.

  3. Cancel the delegation token:

    hdfs fetchdt -fs gs://$PROJECT-demo-bucket --cancel ~/my.dt

    The token is now cancelled and made inoperable.

In Hadoop v3.X, the hdfs fetchdt command was deprecated and replaced with the dtutil command. Cloud Dataproc currently doesn't support Hadoop v3, but if you have access to a cluster with Hadoop v3, you can achieve the same tests as follows:

  1. Get a delegation token:

    hadoop dtutil get gs://$PROJECT-demo-bucket -alias my-alias -renewer alice@$REALM  ~/my.dt
  2. Renew the delegation token:

    hadoop dtutil renew -alias my-alias ~/my.dt
  3. Cancel the delegation token:

    hadoop dtutil cancel -alias my-alias ~/my.dt

What's next?

When you're done running the above test scenarios, you could try deploying larger Dataproc clusters to run larger-scale tests. You could also adapt the provided Terraform scripts to integrate the broker service with your stack and deploy it to your staging environment for further testing.

Setting up local development environment

This section contains some tips if you're interested in making code contributions to this project.

Start a development container:

docker run -it -v $PWD:/base -w /base --detach --name broker-dev ubuntu:18.04
docker exec -it broker-dev bash -- apps/broker/install-dev.sh

To compile the broker service app:

docker exec -it broker-dev bash -c 'cd apps/broker; mvn package'

To compile the broker connector:

docker exec -it broker-dev bash -c 'cd connector; mvn package -Phadoop2'

Interacting with Redis

To interact with the Redis database, first set up some environment variables and functions:

REDIS_HOST=$(gcloud redis instances describe broker-cache --region $REGION --format="value(host)")

function redis-cli() {
  kubectl run -it --rm redis --image=redis --restart=Never --command -- redis-cli -h $REDIS_HOST $@
}

Then, here are some example Redis commands you can run:

  • List keys in a DB number:

    redis-cli -n 0 keys '*'
  • Flush all keys:

    redis-cli flushall

Logging

GCS audit logs

Follow these steps to view the GCS audit logs in Stackdriver:

  1. Open the logs viewer in Stackdriver: https://console.cloud.google.com/logs/viewer

  2. Click the down arrow in the text search box, then click "Convert to advanced filter".

  3. Type the following in the text search box (Replace [PROJECT-ID] with your project ID):

    resource.type="gcs_bucket"
    resource.labels.bucket_name="[PROJECT-ID]-demo-bucket"
    logName="projects/[PROJECT-ID]/logs/cloudaudit.googleapis.com%2Fdata_access"
    
  4. Click "Submit Filter".

Broker application logs

Follow these steps to view the broker application logs in Stackdriver:

  1. Open the logs viewer in Stackdriver: https://console.cloud.google.com/logs/viewer

  2. Click the down arrow in the text search box, then click "Convert to advanced filter".

  3. Type the following in the text search box:

    resource.type="container"
    resource.labels.cluster_name="broker"
    resource.labels.namespace_id="default"
    resource.labels.container_name="broker-container"
    
  4. Click "Submit Filter".

Load testing

This repository contains some load tests that use the Locust framework.

You can run the load tests from the sample Dataproc cluster that you created for the demo.

  1. SSH into the Dataproc master instance:

    gcloud beta compute ssh test-cluster-m --tunnel-through-iap
  2. Clone the project's repository:

    git clone https://github.com/GoogleCloudPlatform/gcp-token-broker
    cd gcp-token-broker/load-testing
  3. Install some dependencies:

    ./install.sh
  4. Create the Python gRPC stubs:

    ``shell python3 -m grpc_tools.protoc --proto_path=. --python_out=. --grpc_python_out=. brokerservice/protobuf/broker.proto

  5. Create a settings.py file using the provided template.

    cp settings.py.template settings.py
  6. Edit the settings.py to set appropriate values for your setup.

  7. To run the load tests in headless mode:

    ~/.local/bin/locust --no-web -c 1000 -r 10

    The -c corresponds to the total number of users, and -r the hatch rate (i.e. the number of new users spawned each passing second). To stop the tests, press ctrl-c.

  8. To run the tests using the Web UI, start the Locust server:

    ~/.local/bin/locust

    Then, in another terminal on your local machine, run the following command to set up a tunnel with the Dataproc master instance:

    gcloud beta compute start-iap-tunnel test-cluster-m 8089 \
    --local-host-port=localhost:8089
    --zone $ZONE

    Then open your browser at the address http://localhost:8089

Note: During the execution of load tests, you might see some errors: Too many open files. This is because all users must read the Kerberos credentials from a temporary cache file, and the limit of open files allowed by the OS might be reached. To increase the limit, run the following command:

ulimit -n 32768

How to Contribute

We'd love to accept your patches and contributions to this project. There are just a few small guidelines you need to follow. See the contributing guide for more details.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Java 86.3%
  • HCL 6.6%
  • Shell 4.2%
  • Python 1.3%
  • Dockerfile 0.6%
  • Closure Templates 0.5%
  • Smarty 0.5%