This initialization action installs Apache Ranger on a Google Cloud Dataproc cluster. Apache Ranger enables monitoring and managing data security across the Hadoop ecosystem and uses Apache Solr for audits.
You can use this initialization action to create a new Dataproc cluster with Apache Ranger installed:
-
Use the
gcloud
command to create a new cluster with this initialization action. The following command will create a new standard cluster named<CLUSTER_NAME>
with the Ranger Policy Manager accessible via useradmin
and<YOUR_PASSWORD>
.gcloud dataproc clusters create <CLUSTER_NAME> \ --initialization-actions gs://$MY_BUCKET/solr/solr.sh,\ gs://$MY_BUCKET/ranger/ranger.sh \ --metadata "default-admin-password=<YOUR_PASSWORD>"
-
Once the cluster has been created Apache Ranger Policy Manager should be running on master node and use Solr in standalone mode for audits.
-
The Policy Manager Web UI is served by default on port 6080. You can login using username
admin
and password provided in metadata. Follow the instructions on connect to cluster web interfaces to create a SOCKS5 proxy to viewhttp://clustername-m:6080
in your browser.
- In HA mode Ranger uses Solr in SolrCloud mode which is recommended setup for auditing and efficient querying audit logs.
- The default admin password can be configured by mandatory
default-admin-password
metadata flag. Ranger requires password that is minimum 8 characters long with min one alphabet and one numeric character. You can also change it after the first log in. - You can override default 6080 port by setting metadata flag
ranger-port
. - Apache Ranger Policy Manager and usersync plugin are installed on master nodes only(m-0 in HA mode).
- This script will install hdfs, hive and yarn plugin by default.
- Ranger is only supported on Dataproc version 1.3 and above.