Apache Hadoop 3.1.2 Docker image with Kerberos enabled

This project is a fork from knappek docker-hadoop-secure and extends it with Spark. Useful for testing Spark jobs on a Hadoop pseudo cluster.

The Docker image is also available on Docker Hub.

Versions

CentOS 7
Open JDK 8u342-b07
Hadoop 3.1.2
Spark 2.4.7

Default Environment Variables

Name	Value	Description
`KRB_REALM`	`EXAMPLE.COM`	The Kerberos Realm, more information here
`DOMAIN_REALM`	`example.com`	The Kerberos Domain Realm, more information here
`KERBEROS_ADMIN`	`admin/admin`	The KDC admin user
`KERBEROS_ADMIN_PASSWORD`	`admin`	The KDC admin password
`KERBEROS_ROOT_USER_PASSWORD`	`password`	The password of the Kerberos principal `root` which maps to the OS root user
`HADOOP_USER`	`hadoop`	Default user for running hadoop

You can simply define these variables in the docker-compose.yml.

Default user for Spark

Default user:group for spark jobs: hadoop:hadoop.

Run image

Clone the Github project and run

docker-compose up -d

As an alternative - you can use TestContainers for testing. You will have to use provided docker-compose file or setup the same programmatically.

Usage

Get the container name with docker ps and login to the container with

docker exec -it <container-name> /bin/bash

To obtain a Kerberos ticket, execute

kinit <username> -k -t ${KEYTAB_DIR}/keytab.name

Afterwards you can use hdfs CLI like

hdfs dfs -ls /

Run spark-submit job that will write or read files into HDFS.

parquet-tools installed for convenience. There is an alias parquet-tools and parquet-tools.sh for non interactive usage (for example in sh scripts).

Known issues

Java Keystore

If the Keystroe has been expired, then create a new keystore.jks:

create private key

openssl genrsa -des3 -out server.key 1024

create csr

openssl req -new -key server.key -out server.csr

remove passphrase in key

cp server.key server.key.org
openssl rsa -in server.key.org -out server.key

create self-signed cert

openssl x509 -req -days 3650 -in server.csr -signkey server.key -out server.crt

create JKS and import certificate. Set password: bigdata

keytool -import -keystore keystore.jks -alias CARoot -file server.crt

Credits

Some docs

Name		Name	Last commit message	Last commit date
Latest commit History 154 Commits
config_files		config_files
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE.md		LICENSE.md
README.md		README.md
bootstrap.sh		bootstrap.sh
docker-compose.yml		docker-compose.yml
parquet-tools.sh		parquet-tools.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Apache Hadoop 3.1.2 Docker image with Kerberos enabled

Versions

Default Environment Variables

Default user for Spark

Run image

Usage

Known issues

Java Keystore

Credits

About

Releases

Packages

Languages

License

DJ-Glock/docker-hadoop-secure

Folders and files

Latest commit

History

Repository files navigation

Apache Hadoop 3.1.2 Docker image with Kerberos enabled

Versions

Default Environment Variables

Default user for Spark

Run image

Usage

Known issues

Java Keystore

Credits

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages