-
Notifications
You must be signed in to change notification settings - Fork 51
/
Copy pathREADME.zeppelin
155 lines (112 loc) · 6.04 KB
/
README.zeppelin
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
Instructions For Using Zeppelin
------------------------------
0) If necessary, download your favorite version of Zeppelin from the
Apache download site and install it into a location where it's
accessible on all cluster nodes. Usually this is on a NFS home
directory.
See 'Convenience Scripts' in README about
misc/magpie-download-and-setup.sh, which may make the
downloading and patching easier.
1) Select an appropriate submission script for running your job. You
can find them in the directory submission-scripts/, with Slurm
Sbatch scripts using srun in script-sbatch-srun, Moab Msub+Slurm
scripts using srun in script-msub-slurm-srun, Moab Msub+Torque
scripts using pdsh in script-msub-torque-pdsh, LSF scripts using
mpirun in script-lsf-mpirun, and Flux scripts in
script-flux-batch-run.
You'll likely want to start with the base spark script
(e.g. magpie.sbatch-srun-spark) or spark w/ zeppelin
(e.g. magpie.sbatch-srun-spark-with-zeppelin) for your
scheduler/resource manager. If you wish to configure more, you can
choose to start with the base script (e.g. magpie.sbatch-srun)
which contains all configuration options.
2) Setup your job essentials at the top of the submission script. As
an example, the following are the essentials for running with Moab.
#MSUB -l nodes : Set how many nodes you want in your job
#MSUB -l walltime : Set the time for this job to run
#MSUB -l partition : Set the job partition
#MSUB -q <my batch queue> : Set to batch queue
MOAB_JOBNAME : Set your job name.
MAGPIE_SCRIPTS_HOME : Set where your scripts are
MAGPIE_LOCAL_DIR : For scratch space files
MAGPIE_JOB_TYPE : This should be set to 'zeppelin' initially
JAVA_HOME : B/c you need to ...
3) Setup the essentials for Zeppelin.
ZEPPELIN_SETUP : Set to yes
ZEPPELIN_VERSION : Set appropriately.
ZEPPELIN_HOME : Where your Zeppelin code is. Typically in an NFS mount.
ZEPPELIN_LOCAL_DIR : A small place for conf files and log files local to
each node. Typically /tmp directory.
ZEPPELIN_NOTEBOOK_USERS : Users, passwords and roles able to access the
notebook
4) Select how your job will run by setting MAGPIE_JOB_TYPE and/or
ZEPPELIN_JOB. Initially, you'll likely want to set MAGPIE_JOB_TYPE
to 'zeppelin' and setting ZEPPELIN_JOB to 'checkzeppelinup'. This
will allow you to run a pre-written job to make sure things are
setup correctly.
After this, you may want to run with MAGPIE_JOB_TYPE set to
'interactive' to play around and figure things out. In the job
output you will see output similar to the following:
* To access Zeppelin's web interface, you'll want to navigate your browser to:
* http://some-node:18080
If you point your browser to this node and port you can access the
Zeppelin Notebook/Interpreter. From here you can create notebooks
and run code.
A quick example once you navigate to the webpage:
- Click the link "Create new note"
- In the first paragraph add the following:
%spark
val count = sc.parallelize(1 to 1000).map{i =>
val x = Math.random()
val y = Math.random()
if (x*x + y*y < 1) 1 else 0
}.reduce(_ + _)
println("Pi is roughly " + 4.0 * count / 1000)
- Click the 'play' button in the top right corner of the paragraph.
This should display a value around 3.14 if successful.
See "Exported Environment Variables" in README for information on
common exported environment variables that may be useful in
scripts.
See below in "Zeppelin Exported Environment Variables", for
information on Zeppelin specific exported environment variables that
may be useful in scripts.
5) Zeppelin requires Spark, so ensure Spark is configured and also in
your submission script. See README.spark for setup instructions.
6) Submit your job into the cluster by running "sbatch -k
./magpie.sbatchfile" for Slurm, "msub ./magpie.msubfile" for
Moab, or "bsub < ./magpie.lsffile" for LSF.
Add any other options you see fit.
7) Look at your job output file to see your output. There will also
be some notes/instructions/tips in the output file for viewing the
status of your job in a web browser, environment variables you wish
to set if interacting with it, etc.
See "General Advanced Usage" in README for additional tips.
Zeppelin Exported Environment Variables
---------------------------------------
The following environment variables are exported when your job is run
and may be useful in scripts in your run or in pre/post run scripts.
ZEPPELIN_MASTER_NODE : the master node running Zeppelin
ZEPPELIN_SERVER_PORT : the web server port for Zeppelin
ZEPPELIN_CONF_DIR : the directory that Zeppelin configuration files local
to the node are stored
ZEPPELIN_LOG_DIR : the directory Zeppelin log files are stored
See "Spark Exported Environment Variables" in README.spark,
for Spark environment variables that may be useful.
See "Zookeeper Exported Environment Variables" in README.zookeeper,
for Zookeeper environment variables that may be useful.
Notes
1) If ZEPPELIN_NOTEBOOK_ENCRYPT_PASSWORD is marked as true, do the following
to create the password for ZEPPELIN_NOTEBOOK_USERS. Make sure that the
shiro-tools-hasher is in the classpath. It can be downloaded where X.X.X
matches shiro-core that is used in the selected zeppelin version. This can
be found by looking at the jar file located at:
zeppelin-Y.Y.Y-bin-all/lib/shiro-core-X.X.X.jar.
The download path where X.X.X matches shiro-core's X.X.X:
http://repo1.maven.org/maven2/org/apache/shiro/tools/
shiro-tools-hasher/X.X.X/shiro-tools-hasher-X.X.X-cli.jar
To create the password run:
$JAVA_HOME/bin/java -jar shiro-tools-hasher-X.X.X-cli.jar -p
This will create a hash like such:
$shiro1$SHA-256$500000$xwGfyokCGZPdGLrIOfsfkg==$/7jglXzJHINptKRRDd8x1BAnJfAlGlKwFEN4yIwuAI0=
Make sure to escape the '$' in the string or use single quotes.
This hash should replace the password for a user in ZEPPELIN_NOTEBOOK_USERS.