Dockerfiles for Apache Spark.
The image is available directly from https://index.docker.io.
This image contains the following softwares:
- OpenJDK 64-Bit v1.8.0_131
- Scala v2.10.6
- SBT v0.13.15
- Apache Spark v1.6.2
There are 2 ways of getting this image:
- Build this image using
Dockerfile
OR - Pull the image directly from DockerHub.
Copy the Dockerfile
to a folder on your local machine and then invoke the following command.
docker build -t p7hb/p7hb-docker-spark:1.6.2 .
docker pull p7hb/p7hb-docker-spark:1.6.2
docker run -it -p 4040:4040 -p 8080:8080 -p 8081:8081 -h spark --name=spark p7hb/p7hb-docker-spark:1.6.2
The above step will launch and run the image with:
root
is the user we logged into.spark
is the container name.spark
is host name of this container.- This is very important as Spark Slaves are started using this host name as the master.
- The container exposes ports 4040, 8080, 8081 for Spark Web UI console(s).
root@spark:~# hostname
spark
root@spark:~# java -version
openjdk version "1.8.0_102"
OpenJDK Runtime Environment (build 1.8.0_102-8u102-b14.1-1~bpo8+1-b14)
OpenJDK 64-Bit Server VM (build 25.102-b14, mixed mode)
root@spark:~# scala -version
Scala code runner version 2.10.6 -- Copyright 2002-2013, LAMP/EPFL
Running sbt about
will download and setup SBT on the image.
root@spark:~# spark-shell
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 1.6.2
/_/
Using Scala version 2.10.5 (OpenJDK 64-Bit Server VM, Java 1.8.0_102)
Type in expressions to have them evaluated.
Type :help for more information.
Spark context available as sc.
SQL context available as sqlContext.
scala>
All the required binaries have been added to the PATH
.
start-master.sh
start-slave.sh spark://spark:7077
spark-submit --class org.apache.spark.examples.SparkPi --master spark://spark:7077 $SPARK_HOME/lib/spark-examples*.jar 100
OR even simpler
$SPARK_HOME/bin/run-example SparkPi 100
Please note the first command above expects Spark Master and Slave to be running. And we can even check the Spark Web UI after executing this command. But with the second command, this is not possible.
spark-shell --master spark://spark:7077
Only available for the duration of the application.
This is the IP Address which needs to be used to look upto for all the exposed ports of our Docker container.
docker-machine ip default
docker ps
docker ps -a
docker stats --all shows a running list of containers.
docker inspect <<Container_Name>> | grep IPAddress
We can open new terminal with new instance of container's shell with the following command.
docker exec -it <<Container_ID>> /bin/bash #by Container ID
OR
docker exec -it <<Container_Name>> /bin/bash #by Container Name
If you find any issues or would like to discuss further, please ping me on my Twitter handle @P7h or drop me an email.
Copyright © 2016 Prashanth Babu.
Licensed under the Apache License, Version 2.0.