-
Notifications
You must be signed in to change notification settings - Fork 51
/
Copy pathREADME.spark
442 lines (349 loc) · 22.6 KB
/
README.spark
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
Instructions For Spark
----------------------
0) If necessary, download your favorite version of Spark from the
Apache download site and install it into a location where it's
accessible on all cluster nodes. Usually this is on a NFS home
directory.
See below in 'Spark Patching' about patches that are necessary for
Spark and patches that may be necessary depending on your
environment and Spark version.
See 'Convenience Scripts' in README about
misc/magpie-download-and-setup.sh, which may make the
downloading and patching easier.
1) Select an appropriate submission script for running your job. You
can find them in the directory submission-scripts/, with Slurm
Sbatch scripts using srun in script-sbatch-srun, Moab Msub+Slurm
scripts using srun in script-msub-slurm-srun, Moab Msub+Torque
scripts using pdsh in script-msub-torque-pdsh, LSF scripts using
mpirun in script-lsf-mpirun, and Flux scripts in
script-flux-batch-run.
You'll likely want to start with the base spark script
(e.g. magpie.sbatch-srun-spark) or spark w/ hdfs
(e.g. magpie.sbatch-srun-spark-with-hdfs) for your
scheduler/resource manager (other script starters include yarn or yarn
& hdfs). If you wish to configure more, you can choose to start
with the base script (e.g. magpie.sbatch-srun) which contains all
configuration options.
It should be noted that you can run Spark without HDFS. You can
access files normally through "file://<path>".
2) Setup your job essentials at the top of the submission script. As
an example, the following are the essentials for running with Moab.
#MSUB -l nodes : Set how many nodes you want in your job
#MSUB -l walltime : Set the time for this job to run
#MSUB -l partition : Set the job partition
#MSUB -q <my batch queue> : Set to batch queue
MOAB_JOBNAME : Set your job name.
MAGPIE_SCRIPTS_HOME : Set where your scripts are
MAGPIE_LOCAL_DIR : For scratch space files
MAGPIE_JOB_TYPE : This should be set to 'spark' initially
JAVA_HOME : B/c you need to ...
3) Setup the essentials for Spark.
SPARK_SETUP : Set to yes
SPARK_SETUP_TYPE : Set if you want to run with or without Yarn.
Most users will want to set to "STANDALONE".
SPARK_VERSION : Set appropriately.
SPARK_HOME : Where your Spark code is. Typically in an NFS
mount.
SPARK_LOCAL_DIR : A small place for conf files and log files local
to each node. Typically /tmp directory.
SPARK_LOCAL_SCRATCH_DIR : A scratch directory for Spark to use. If
a local SSD/NVRAM is available, it would be preferable to set this
to that path.
4) Select how your job will run by setting MAGPIE_JOB_TYPE and/or
SPARK_JOB. Initially, you'll likely want to set MAGPIE_JOB_TYPE to
'spark' and setting SPARK_JOB to 'sparkpi'. This will allow you to
run a pre-written job to make sure things are setup correctly.
After this, you may want to run with MAGPIE_JOB_TYPE set to
'interactive' to play around and figure things out. In the job
output you will see output similar to the following:
ssh node70
setenv JAVA_HOME "/usr/lib/jvm/jre-1.7.0-oracle.x86_64/"
setenv SPARK_HOME "/home/username/spark-1.6.2-bin-hadoop2.6"
setenv SPARK_CONF_DIR "/tmp/username/spark/ajobname/1174962/conf"
These instructions will inform you how to login to the master node
of your allocation and how to initialize your session. Once in
your session, you can do as you please. For example, you can run a
job using spark-class (bin/spark-class ...). There will also be
instructions in your job output on how to tear the session down
cleanly if you wish to end your job early.
Once you have figured out how you wish to run your job, you will
likely want to run with MAGPIE_JOB_TYPE set to 'script' mode.
Create a script that will run your job/calculation automatically,
set it in MAGPIE_JOB_SCRIPT, and then run your job. You can find
an example job script in examples/spark-example-job-script.
See "Exported Environment Variables" in README for information on
common exported environment variables that may be useful in
scripts.
See below in "Spark Exported Environment Variables", for
information on Spark specific exported environment variables that
may be useful in scripts.
5) Spark does not require HDFS, but many choose to use it. If you do,
setup Hadoop w/ HDFS in your submission script. See README.hadoop
for Hadoop setup instructions. Simply use the prefix "hdfs://" or
"file://" appropriately for the filesystem you will access files
from.
You may wish to run with SPARK_MODE set to 'sparkwordcount' to test
the HDFS setup.
6) Submit your job into the cluster by running "sbatch -k
./magpie.sbatchfile" for Slurm, "msub ./magpie.msubfile" for
Moab, or "bsub < ./magpie.lsffile" for LSF.
Add any other options you see fit.
7) Look at your job output file to see your output. There will also
be some notes/instructions/tips in the output file for viewing the
status of your job in a web browser, environment variables you wish
to set if interacting with it, etc.
See "General Advanced Usage" in README for additional tips.
See below in "Spark Advanced Usage" for additional Spark tips.
Spark Exported Environment Variables
------------------------------------
The following environment variables are exported when your job is run
and may be useful in scripts in your run or in pre/post run scripts.
SPARK_MASTER_NODE : the master node of the Spark allocation. Often
used for launching Spark jobs
(e.g. spark://${SPARK_MASTER_NODE}:${SPARK_MASTER_PORT})
SPARK_MASTER_PORT : the master port for running Spark jobs. Often
used for launching Spark jobs
(e.g. spark://${SPARK_MASTER_NODE}:${SPARK_MASTER_PORT})
Not exported if using Yarn.
SPARK_WORKER_COUNT : number of compute/data nodes in your allocation
for Spark. May be useful for adjusting run time
options such as parallelism count.
SPARK_WORKER_CORE_COUNT : Total cores on worker nodes in the allocation.
May be useful for adjusting run time options
such as parallelism count.
SPARK_CONF_DIR : the directory that Spark configuration files local
to the node are stored.
SPARK_LOG_DIR : the directory Spark log files are stored
See "Hadoop Exported Environment Variables" in README.hadoop, for
Hadoop environment variables that may be useful.
Example Job Output for Spark running SparkPi
--------------------------------------------
The following is an example job output of Magpie running Spark and
running SparkPi. This is run over HDFS over Lustre. Sections of
extraneous text have been left out.
While this output is specific to using Magpie with Spark, the output
when using Hadoop, Storm, Hbase, etc. is not all that different.
1) First we get some details of the job
*******************************************************
* Magpie General Job Info
*
* Job Nodelist: apex[203-211]
* Job Nodecount: 9
* Job Timelimit in Minutes: 90
* Job Name: test
* Job ID: 1174575
*
*******************************************************
2) Next, Spark begins to launch and startup daemons on all cluster nodes.
Starting spark
starting org.apache.spark.deploy.master.Master, logging to /tmp/achu/spark/test/1174784/log/spark-achu-org.apache.spark.deploy.master.Master-1-apex217.out
apex224: starting org.apache.spark.deploy.worker.Worker, logging to /tmp/achu/spark/test/1174784/log/spark-achu-org.apache.spark.deploy.worker.Worker-1-apex224.out
apex222: starting org.apache.spark.deploy.worker.Worker, logging to /tmp/achu/spark/test/1174784/log/spark-achu-org.apache.spark.deploy.worker.Worker-1-apex222.out
apex218: starting org.apache.spark.deploy.worker.Worker, logging to /tmp/achu/spark/test/1174784/log/spark-achu-org.apache.spark.deploy.worker.Worker-1-apex218.out
apex225: starting org.apache.spark.deploy.worker.Worker, logging to /tmp/achu/spark/test/1174784/log/spark-achu-org.apache.spark.deploy.worker.Worker-1-apex225.out
apex219: starting org.apache.spark.deploy.worker.Worker, logging to /tmp/achu/spark/test/1174784/log/spark-achu-org.apache.spark.deploy.worker.Worker-1-apex219.out
apex223: starting org.apache.spark.deploy.worker.Worker, logging to /tmp/achu/spark/test/1174784/log/spark-achu-org.apache.spark.deploy.worker.Worker-1-apex223.out
apex220: starting org.apache.spark.deploy.worker.Worker, logging to /tmp/achu/spark/test/1174784/log/spark-achu-org.apache.spark.deploy.worker.Worker-1-apex220.out
apex221: starting org.apache.spark.deploy.worker.Worker, logging to /tmp/achu/spark/test/1174784/log/spark-achu-org.apache.spark.deploy.worker.Worker-1-apex221.out
Waiting 30 seconds to allow Spark daemons to setup
3) Next, we see output with details of the Spark setup. You'll find
addresses indicating web services you can access to get detailed
job information. You'll also find information about how to login
to access Spark directly and how to shut down the job early if you
so desire.
*******************************************************
*
* Spark Information
*
* You can view your Spark status by launching a web browser and pointing to ...
*
* Spark Master: http://apex217:8080
* Spark Worker: http://<WORKERNODE>:8081
* Spark Application Dashboard: http://apex217:4040
*
* The Spark Master for running jobs is
*
* spark://apex217:7077
*
* To access Spark directly, you'll want to:
*
* ssh apex217
* setenv JAVA_HOME "/usr/lib/jvm/jre-1.7.0-oracle.x86_64/"
* setenv SPARK_HOME "/home/achu/hadoop/spark-1.6.2-bin-hadoop2.6"
* setenv SPARK_CONF_DIR "/tmp/achu/spark/test/1174784/conf"
*
* Then you can do as you please. For example to run a job:
*
* $SPARK_HOME/bin/spark-submit --class <class> <jar>
*
* To end/cleanup your session & kill all daemons, login and set
* environment variables per the instructions above, then run:
*
* $SPARK_HOME/sbin/stop-all.sh
*
*******************************************************
4) Then the SparkPi job is run
Running bin/run-example org.apache.spark.examples.SparkPi 8
16/07/18 23:28:32 INFO SparkContext: Running Spark version 1.6.2
16/07/18 23:28:32 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/07/18 23:28:32 INFO SecurityManager: Changing view acls to: achu
16/07/18 23:28:32 INFO SecurityManager: Changing modify acls to: achu
16/07/18 23:28:32 INFO SecurityManager: SecurityManager: authentication enabled; ui acls disabled; users with view permissions: Set(achu); users with modify permissions: Set(achu)
16/07/18 23:28:36 WARN ThreadLocalRandom: Failed to generate a seed from SecureRandom within 3 seconds. Not enough entrophy?
16/07/18 23:28:36 INFO Utils: Successfully started service 'sparkDriver' on port 37197.
16/07/18 23:28:36 INFO Slf4jLogger: Slf4jLogger started
16/07/18 23:28:36 INFO Remoting: Starting remoting
16/07/18 23:28:36 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://[email protected]:56249]
16/07/18 23:28:36 INFO Utils: Successfully started service 'sparkDriverActorSystem' on port 56249.
16/07/18 23:28:36 INFO SparkEnv: Registering MapOutputTracker
16/07/18 23:28:36 INFO SparkEnv: Registering BlockManagerMaster
16/07/18 23:28:36 INFO DiskBlockManager: Created local directory at /p/lscratchg/achu/testing/sparkscratch/node-0/blockmgr-370e9495-d9eb-41f8-ba84-ca5e643f7375
16/07/18 23:28:36 INFO MemoryStore: MemoryStore started with capacity 35.7 GB
16/07/18 23:28:39 INFO SparkEnv: Registering OutputCommitCoordinator
16/07/18 23:28:39 INFO Utils: Successfully started service 'SparkUI' on port 4040.
16/07/18 23:28:39 INFO SparkUI: Started SparkUI at http://192.168.123.217:4040
16/07/18 23:28:39 INFO HttpFileServer: HTTP File server directory is /p/lscratchg/achu/testing/sparkscratch/node-0/spark-1456877b-4bf0-4c44-869f-b0a6f96c5546/httpd-9f583943-6701-40ca-8556-5537552a5b33
16/07/18 23:28:39 INFO HttpServer: Starting HTTP Server
16/07/18 23:28:39 INFO Utils: Successfully started service 'HTTP file server' on port 42304.
16/07/18 23:28:41 INFO SparkContext: Added JAR file:/home/achu/hadoop/spark-1.6.2-bin-hadoop2.6/lib/spark-examples-1.6.2-hadoop2.6.0.jar at http://192.168.123.217:42304/jars/spark-examples-1.6.2-hadoop2.6.0.jar with timestamp 1468909721269
16/07/18 23:28:41 INFO AppClient$ClientEndpoint: Connecting to master spark://apex217:7077...
16/07/18 23:28:41 INFO SparkDeploySchedulerBackend: Connected to Spark cluster with app ID app-20160718232841-0000
16/07/18 23:28:41 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 41539.
16/07/18 23:28:41 INFO NettyBlockTransferService: Server created on 41539
16/07/18 23:28:41 INFO BlockManagerMaster: Trying to register BlockManager
16/07/18 23:28:41 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.123.217:41539 with 35.7 GB RAM, BlockManagerId(driver, 192.168.123.217, 41539)
16/07/18 23:28:41 INFO BlockManagerMaster: Registered BlockManager
16/07/18 23:28:41 INFO AppClient$ClientEndpoint: Executor added: app-20160718232841-0000/0 on worker-20160718232805-192.168.123.223-34505 (192.168.123.223:34505) with 16 cores
16/07/18 23:28:41 INFO SparkDeploySchedulerBackend: Granted executor ID app-20160718232841-0000/0 on hostPort 192.168.123.223:34505 with 16 cores, 50.0 GB RAM
16/07/18 23:28:41 INFO AppClient$ClientEndpoint: Executor added: app-20160718232841-0000/1 on worker-20160718232805-192.168.123.224-49414 (192.168.123.224:49414) with 16 cores
16/07/18 23:28:41 INFO SparkDeploySchedulerBackend: Granted executor ID app-20160718232841-0000/1 on hostPort 192.168.123.224:49414 with 16 cores, 50.0 GB RAM
16/07/18 23:28:41 INFO AppClient$ClientEndpoint: Executor added: app-20160718232841-0000/2 on worker-20160718232805-192.168.123.220-35442 (192.168.123.220:35442) with 16 cores
16/07/18 23:28:41 INFO SparkDeploySchedulerBackend: Granted executor ID app-20160718232841-0000/2 on hostPort 192.168.123.220:35442 with 16 cores, 50.0 GB RAM
16/07/18 23:28:41 INFO AppClient$ClientEndpoint: Executor added: app-20160718232841-0000/3 on worker-20160718232805-192.168.123.222-50299 (192.168.123.222:50299) with 16 cores
16/07/18 23:28:41 INFO SparkDeploySchedulerBackend: Granted executor ID app-20160718232841-0000/3 on hostPort 192.168.123.222:50299 with 16 cores, 50.0 GB RAM
16/07/18 23:28:41 INFO AppClient$ClientEndpoint: Executor added: app-20160718232841-0000/4 on worker-20160718232806-192.168.123.225-55852 (192.168.123.225:55852) with 16 cores
16/07/18 23:28:41 INFO SparkDeploySchedulerBackend: Granted executor ID app-20160718232841-0000/4 on hostPort 192.168.123.225:55852 with 16 cores, 50.0 GB RAM
16/07/18 23:28:41l.gov, partition 0,PROCESS_LOCAL, 2158 bytes)
16/07/18 23:28:53 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, apex219.llnl.gov, partition 1,PROCESS_LOCAL, 2158 bytes)
16/07/18 23:28:53 INFO TaskSetManager: Starting task 2.0 in stage 0.0 (TID 2, apex219.llnl.gov, partition 2,PROCESS_LOCAL, 2158 bytes)
16/07/18 23:28:53 INFO TaskSetManager: Starting task 3.0 in stage 0.0 (TID 3, apex219.llnl.gov, partition 3,PROCESS_LOCAL, 2158 bytes)
16/07/18 23:28:53 INFO TaskSetManager: Starting task 4.0 tage 0.0 (TID 1) in 5259 ms on apex219.llnl.gov (7/8)
16/07/18 23:28:59 INFO TaskSetManager: Finished task 4.0 in stage 0.0 (TID 4) in 5259 ms on apex219.llnl.gov (8/8)
16/07/18 23:28:59 INFO DAGScheduler: ResultStage 0 (reduce at SparkPi.scala:36) finished in 16.885 s
16/07/18 23:28:59 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
16/07/18 23:28:59 INFO DAGScheduler: Job 0 finished: reduce at SparkPi.scala:36, took 17.096651 s
Pi is roughly 3.1385
16/07/18 23:28:59 INFO SparkUI: Stopped Spark web UI at http://192.168.123.217:4040
16/07/18 23:28:59 INFO SparkDeploySchedulerBackend: Shutting down all executors
16/07/18 23:28:59 INFO SparkDeploySchedulerBackend: Asking each executor to shut down
16/07/18 23:28:59 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
16/07/18 23:28:59 INFO MemoryStore: MemoryStore cleared
16/07/18 23:28:59 INFO BlockManager: BlockManager stopped
16/07/18 23:28:59 INFO BlockManagerMaster: BlockManagerMaster stopped
16/07/18 23:28:59 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
16/07/18 23:28:59 WARN Dispatcher: Message RemoteProcessDisconnected(apex217:7077) dropped.
16/07/18 23:28:59 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
16/07/18 23:28:59 INFO RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
16/07/18 23:28:59 INFO SparkContext: Successfully stopped SparkContext
16/07/18 23:28:59 INFO ShutdownHookManager: Shutdown hook called
16/07/18 23:28:59 INFO ShutdownHookManager: Deleting directory /p/lscratchg/achu/testing/sparkscratch/node-0/spark-1456877b-4bf0-4c44-869f-b0a6f96c5546/httpd-9f583943-6701-40ca-8556-5537552a5b33
16/07/18 23:28:59 INFO ShutdownHookManager: Deleting directory /p/lscratchg/achu/testing/sparkscratch/node-0/spark-1456877b-4bf0-4c44-869f-b0a6f96c5546
The Pi approximation is 3.1385.
5) With the job complete, Magpie now tears down the session and cleans
up all daemons.
Stopping spark
apex222: stopping org.apache.spark.deploy.worker.Worker
apex219: stopping org.apache.spark.deploy.worker.Worker
apex218: stopping org.apache.spark.deploy.worker.Worker
apex221: stopping org.apache.spark.deploy.worker.Worker
apex223: stopping org.apache.spark.deploy.worker.Worker
apex225: stopping org.apache.spark.deploy.worker.Worker
apex220: stopping org.apache.spark.deploy.worker.Worker
apex224: stopping org.apache.spark.deploy.worker.Worker
stopping org.apache.spark.deploy.master.Master
Spark Patching
--------------
- Patch to support alternate config file directories is required.
- Patch to support non-ssh remote execution may be needed in some
environments. Patch can be applied directly to startup scripts, not
needing a recompilation of source.
The alternate remote execution command must be specified in the
environment variable MAGPIE_REMOTE_CMD.
- For versions >= 1.1.0, a patch can be applied directly to startup
scripts, not needing a recompilation of source.
You can find patches for the "alternates" listed above in the
patches/spark/ directory with the suffix "alternate.patch".
This patch should be applied first.
Patches for these versions can be found in the patches/spark/ directory.
- If MAGPIE_NO_LOCAL_DIR support is desired, patches in patches/spark/
with the "no-local-dir.patch" suffix in the filename can be found
for support. See README.no-local-dir for more details.
This patch should be applied second, after the "alternate" patch.
Spark Troubleshooting
---------------------
1) What does the error "Initial job has not accepted any resources;
check your cluster UI to ensure that workers are registered and have
sufficient resources" mean?
This means that you are trying to submit a Spark job that desires
more resources than you have allocated. For example, you may be
requesting more memory or CPUs than Spark can schedule.
Incorrect settings may occur in several ways, such as the
--executor-memory or --executor-cores options in spark-submit or
the SPARK_JOB_MEMORY environment variable in the job submission
script.
Spark Advanced Usage
---------------------
1) If your cluster has a local SSD/NVRAM on each node, set a path to
it via the SPARK_LOCAL_SCRATCH_DIR environment variable in your
submission scripts.
Setting this SSD/NVRAM serves two purposes. One, this local
scratch directory can be used for spillover from map outputs to aid
in shuffles. This local scratch directory can greatly improve
shuffle performance.
Second, it can be used for quickly storing/caching RDDs to disk
using MEMORY_AND_DISK and/or DISK_ONLY persistence levels.
2) Magpie configures the default parallelism in a Spark job depending
on the Spark version. In versions < 1.3.0, Magpie defaults the
parallelism to the number of compute nodes in your allocation.
This is significantly superior to the original Spark default of 8
(pre-1.0). However, it may not be optimal for many jobs.
For versions >= 1.3.0, Magpie defaults to not setting this value.
Spark will instead use a default parallelism equal to the number of
cores in your allocation. In some cases, this number will be too
high and will cause too much overhead for your job.
Users should play around with the parallelism in their job to
improve performance. A number of Spark collectives can take a
partition number as an argument (such as reduceByKey). The default
can be tweaked in the submission scripts via the
SPARK_DEFAULT_PARALLELISM environment variable.
3) Magpie configures a relatively conservative amount of memory for
Spark, currently 80% of system memory. While there should always
be a buffer to allow the operating system, system daemons, and
Spark (and potentially Hadoop HDFS) daemons to operate, the 80%
value may be on the conservative side and users wishing to push it
higher to 90% or 95% of system memory may see benefits..
Users can adjust the amount of memory used by each Spark Worker
through the SPARK_WORKER_MEMORY_PER_NODE environment variable in
the submission scripts.
4) There are two major memory fraction configuration variables
SPARK_STORAGE_MEMORY_FRACTION and SPARK_SHUFFLE_MEMORY_FRACTION
which may have major effects on performance depending on your job.
SPARK_STORAGE_MEMORY_FRACTION controls the percentage of memory
used for the memory cache while SPARK_SHUFFLE_MEMORY_FRACTION
controls the percentage used for shuffles.
You may wish to adjust these for your specific job, as they can
have a large influence on job performance. Please see submission
scripts for more information.
Note that beginning Spark 1.6.0 memory fractions have been
deprecated.
Known Issues
------------
From atleast Spark 2.0.0 to Spark 2.1.1, using Yarn with RawnetworkFS
worked within Magpie and its testing. This test broke in Spark 2.2.0.
Upstream issue: https://issues.apache.org/jira/browse/SPARK-21570
Suggestion is Spark w/ Yarn and non-HDFS is not supported.
Using the Spark standalone scheduler, Spark w/ RawnetworkFS still
works.
Notes
-----
Beginning in Spark 3.1.1, Python 2.7, 3.4, and 3.5 support was removed.