-
Notifications
You must be signed in to change notification settings - Fork 51
/
Copy pathREADME.hadoop
810 lines (680 loc) · 37.3 KB
/
README.hadoop
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
Instructions For Running Hadoop
-------------------------------
0) If necessary, download your favorite version of Hadoop from the
Apache download site and install it into a location where it's
accessible on all cluster nodes. Usually this is on a NFS home
directory.
See below in 'Hadoop Patching' about patches that may be necessary
for Hadoop depending on your environment and Hadoop version.
See 'Convenience Scripts' in README about
misc/magpie-download-and-setup.sh, which may make the
downloading and patching easier.
1) Select an appropriate submission script for running your job. You
can find them in the directory submission-scripts/, with Slurm
Sbatch scripts using srun in script-sbatch-srun, Moab Msub+Slurm
scripts using srun in script-msub-slurm-srun, Moab Msub+Torque
scripts using pdsh in script-msub-torque-pdsh, LSF scripts using
mpirun in script-lsf-mpirun, and Flux scripts in
script-flux-batch-run.
You'll likely want to start with the base hadoop script
(e.g. magpie.sbatch-srun-hadoop) for your scheduler/resource
manager. If you wish to configure more, you can choose to start
with the base script (e.g. magpie.sbatch-srun) which contains all
configuration options.
2) Setup your job essentials at the top of the submission script. As
an example, the following are the essentials for running with Moab.
#MSUB -l nodes : Set how many nodes you want in your job
#MSUB -l walltime : Set the time for this job to run
#MSUB -l partition : Set the job partition
#MSUB -q <my batch queue> : Set to batch queue
MOAB_JOBNAME : Set your job name.
MAGPIE_SCRIPTS_HOME : Set where your scripts are
MAGPIE_LOCAL_DIR : For scratch space files
MAGPIE_JOB_TYPE : This should be set to 'hadoop' initially
JAVA_HOME : B/c you need to ...
3) Now setup the essentials for Hadoop.
HADOOP_SETUP : Set to yes
HADOOP_SETUP_TYPE : Set if you want to run both Yarn & HDFS, or
just one only. Most users will want to set to "MR".
HADOOP_VERSION : Make sure your build matches HADOOP_SETUP_TYPE
(i.e. don't say you want MapReduce 1 and point to Hadoop 2.0 build)
HADOOP_HOME : Where your hadoop code is. Typically in an NFS mount.
HADOOP_LOCAL_DIR : A small place for conf files and log files local
to each node. Typically /tmp directory.
HADOOP_FILESYSTEM_MODE : Most will likely want "hdfsoverlustre" or
"hdfsovernetworkfs". See below for details on HDFS over Lustre and
HDFS over NetworkFS.
HADOOP_HDFSOVERLUSTRE_PATH or equivalent: For HDFS over Lustre, you
need to set this. If not using HDFS over Lustre, set the
appropriate path for your filesystem mode choice.
4) Select how your job will run by setting MAGPIE_JOB_TYPE and/or
HADOOP_JOB. Initially, you'll likely want to set MAGPIE_JOB_TYPE
to 'hadoop' and setting HADOOP_JOB to 'terasort'. This will allow
you to run a pre-written job to make sure things are setup
correctly.
After this, you may want to run with MAGPIE_JOB_TYPE set to
'interactive' to play around and figure things out. In the job
output you will see output similar to the following:
ssh node70
setenv JAVA_HOME "/usr/lib/jvm/jre-1.7.0-oracle.x86_64/"
setenv HADOOP_HOME "/home/username/hadoop-2.7.2"
setenv HADOOP_CONF_DIR "/tmp/username/hadoop/ajobname/1174570/conf"
These instructions will inform you how to login to the master node
of your allocation and how to initialize your session. Once in
your session, you can do as you please. For example, you can
interact with the Hadoop filesystem (bin/hadoop fs ...) or run a
job (bin/hadoop jar ...). There will also be instructions in your
job output on how to tear the session down cleanly if you wish to
end your job early.
Once you have figured out how you wish to run your job, you will
likely want to run with MAGPIE_JOB_TYPE set to 'script' mode.
Create a script that will run your job/calculation automatically,
set it in MAGPIE_JOB_SCRIPT, and then run your job. You can find
an example job script in examples/hadoop-example-job-script.
See "Exported Environment Variables" in README for information on
common exported environment variables that may be useful in
scripts.
See below in "Hadoop Exported Environment Variables", for
information on Hadoop specific exported environment variables that
may be useful in scripts.
5) Submit your job into the cluster by running "sbatch -k
./magpie.sbatchfile" for Slurm, "msub ./magpie.msubfile" for
Moab, or "bsub < ./magpie.lsffile" for LSF.
Add any other options you see fit.
6) Look at your job output file to see your output. There will also
be some notes/instructions/tips in the output file for viewing the
status of your job in a web browser, environment variables you wish
to set if interacting with it, etc.
See "General Advanced Usage" in README for additional tips.
See below in "Hadoop Advanced Usage" for additional Hadoop tips.
Hadoop Exported Environment Variables
-------------------------------------
The following environment variables are exported when your job is run
and may be useful in scripts in your run or in pre/post run scripts.
HADOOP_MASTER_NODE : the master node of the Hadoop allocation
HADOOP_WORKER_COUNT : number of compute/data nodes in your allocation
for Hadoop. May be useful for adjusting run time
options such as reducer count.
HADOOP_WORKER_CORE_COUNT : Total cores on worker nodes in the
allocation. May be useful for adjusting run
time options such as reducer count.
HADOOP_NAMENODE : the master namenode of the Hadoop allocation. Often
used for accessing HDFS when the namenode + port
must be specified in a script.
(e.g. hdfs://${HADOOP_NAMENODE}:${HADOOP_NAMENODE_PORT}/user/...)
Exported only if HDFS type file system used.
HADOOP_NAMENODE_PORT : the port of the namenode. Often used for
accessing HDFS when the namenode + port must be
specified in a script.
(e.g. hdfs://${HADOOP_NAMENODE}:${HADOOP_NAMENODE_PORT}/user/...)
Exported only if HDFS type file system used.
HADOOP_CONF_DIR : the directory that Hadoop configuration files local
to the node are stored.
HADOOP_LOG_DIR : the directory Hadoop log files are stored
Hadoop Convenience Scripts
--------------------------
The following job scripts may be convenient. They can be run by
setting MAGPIE_JOB_TYPE set to 'script' and setting MAGPIE_JOB_SCRIPT
to the script.
job-scripts/hadoop-rebalance-hdfs-over-lustre-or-hdfs-over-networkfs-if-increasing-nodes-script.sh
- See "Basics of HDFS over Lustre/NetworkFS" section for details.
job-scripts/hadoop-hdfs-fsck-cleanup-corrupted-blocks-script.sh -
Cleanup/remove corrupted blocks in HDFS.
Example Job Output for Hadoop running Terasort
----------------------------------------------
The following is an example job output of Magpie running Hadoop and
running a Terasort. This is run over HDFS over Lustre. Sections of
extraneous text have been left out.
While this output is specific to using Magpie with Hadoop, the output
when using Spark, Storm, Hbase, etc. is not all that different.
1) First we see that HDFS over Lustre is being setup by formatting the
HDFS Namenode.
*******************************************************
* Performing Post Setup
*******************************************************
*******************************************************
* Formatting HDFS Namenode
*******************************************************
16/07/18 23:18:20 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = apex69.llnl.gov/192.168.123.69
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 2.7.2
<snip>
<snip>
<snip>
16/07/18 23:18:26 INFO common.Storage: Storage directory /p/lscratchg/achu/testing/hdfsoverlustre/1174570/node-0/dfs/name has been successfully formatted.
16/07/18 23:18:26 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
16/07/18 23:18:26 INFO util.ExitUtil: Exiting with status 0
16/07/18 23:18:26 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at apex69.llnl.gov/192.168.123.69
************************************************************/
*******************************************************
* Post Setup Complete
*******************************************************
2) Next we get some details of the job
*******************************************************
* Magpie General Job Info
*
* Job Nodelist: apex[69-77]
* Job Nodecount: 9
* Job Timelimit in Minutes: 90
* Job Name: test
* Job ID: 1174570
*
*******************************************************
3) Hadoop begins to launch and startup daemons on all cluster nodes.
Starting hadoop
16/07/18 23:18:48 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting namenodes on [apex69]
apex69: starting namenode, logging to /tmp/achu/hadoop/test/1174570/log/hadoop-achu-namenode-apex69.out
apex75: starting datanode, logging to /tmp/achu/hadoop/test/1174570/log/hadoop-achu-datanode-apex75.out
apex77: starting datanode, logging to /tmp/achu/hadoop/test/1174570/log/hadoop-achu-datanode-apex77.out
apex74: starting datanode, logging to /tmp/achu/hadoop/test/1174570/log/hadoop-achu-datanode-apex74.out
apex73: starting datanode, logging to /tmp/achu/hadoop/test/1174570/log/hadoop-achu-datanode-apex73.out
apex72: starting datanode, logging to /tmp/achu/hadoop/test/1174570/log/hadoop-achu-datanode-apex72.out
apex70: starting datanode, logging to /tmp/achu/hadoop/test/1174570/log/hadoop-achu-datanode-apex70.out
apex76: starting datanode, logging to /tmp/achu/hadoop/test/1174570/log/hadoop-achu-datanode-apex76.out
apex71: starting datanode, logging to /tmp/achu/hadoop/test/1174570/log/hadoop-achu-datanode-apex71.out
Starting secondary namenodes [apex69]
apex69: starting secondarynamenode, logging to /tmp/achu/hadoop/test/1174570/log/hadoop-achu-secondarynamenode-apex69.out
16/07/18 23:19:10 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
starting yarn daemons
starting resourcemanager, logging to /tmp/achu/hadoop/test/1174570/log/yarn-achu-resourcemanager-apex69.out
apex70: starting nodemanager, logging to /tmp/achu/hadoop/test/1174570/log/yarn-achu-nodemanager-apex70.out
apex73: starting nodemanager, logging to /tmp/achu/hadoop/test/1174570/log/yarn-achu-nodemanager-apex73.out
apex77: starting nodemanager, logging to /tmp/achu/hadoop/test/1174570/log/yarn-achu-nodemanager-apex77.out
apex75: starting nodemanager, logging to /tmp/achu/hadoop/test/1174570/log/yarn-achu-nodemanager-apex75.out
apex76: starting nodemanager, logging to /tmp/achu/hadoop/test/1174570/log/yarn-achu-nodemanager-apex76.out
apex71: starting nodemanager, logging to /tmp/achu/hadoop/test/1174570/log/yarn-achu-nodemanager-apex71.out
apex72: starting nodemanager, logging to /tmp/achu/hadoop/test/1174570/log/yarn-achu-nodemanager-apex72.out
apex74: starting nodemanager, logging to /tmp/achu/hadoop/test/1174570/log/yarn-achu-nodemanager-apex74.out
Waiting 30 seconds to allows Hadoop daemons to setup
4) Next, we see output with details of the Hadoop setup. You'll find
addresses indicating web services you can access to get detailed
job information. You'll also find information about how to login
to access Hadoop directly and how to shut down the job early if you
so desire.
*******************************************************
*
* Hadoop Information
*
* You can view your Hadoop status by launching a web browser and pointing to ...
*
* Yarn Resource Manager: http://apex69:8088
*
* Job History Server: http://apex69:19888
*
* HDFS Namenode: http://apex69:50070
* HDFS DataNode: http://<DATANODE>:50075
*
* HDFS can be accessed directly at:
*
* hdfs://apex69:54310
*
* To access Hadoop directly, you'll want to:
*
* ssh apex69
* setenv JAVA_HOME "/usr/lib/jvm/jre-1.7.0-oracle.x86_64/"
* setenv HADOOP_HOME "/home/achu/hadoop/hadoop-2.7.2"
* setenv HADOOP_CONF_DIR "/tmp/achu/hadoop/test/1174570/conf"
*
* Then you can do as you please. For example to interact with the Hadoop filesystem:
*
* $HADOOP_HOME/bin/hdfs dfs ...
*
* To launch jobs you'll want to:
*
* $HADOOP_HOME/bin/hadoop jar ...
*
* To end/cleanup your session & kill all daemons, login and set
* environment variables per the instructions above, then run:
*
* $HADOOP_HOME/sbin/stop-yarn.sh
* $HADOOP_HOME/sbin/stop-dfs.sh
* $HADOOP_HOME/sbin/mr-jobhistory-daemon.sh stop historyserver
*
*******************************************************
<snip>
<snip>
<snip>
*******************************************************
* Run
*
* ssh apex69 kill -s 10 27588
*
* to exit job early.
*
* Warning: killing early may not kill jobs submitted to an internally
* managed scheduler within Magpie. The job will be canceled during teardown
* of daemons. Extraneous error messages from your job may occur until then.
*
* Magpie is aware that Yarn has been launched. If user wishes to
* kill all currently submitted Yarn jobs in the PREP or
* RUNNING state to be killed before killing the job, run:
*
* ssh apex69 kill -s 28 27588
*******************************************************
5) The job then runs Teragen
*******************************************************
* Executing TeraGen
*******************************************************
Running bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar teragen -Dmapreduce.job.maps=38 -Ddfs.datanode.drop.cache.behind.reads=true -Ddfs.datanode.drop.cache.behind.writes=true 50000000 terasort-teragen
16/07/18 23:20:01 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/07/18 23:20:01 INFO client.RMProxy: Connecting to ResourceManager at apex69/192.168.123.69:8032
16/07/18 23:20:17 INFO terasort.TeraSort: Generating 50000000 using 38
16/07/18 23:20:29 INFO mapreduce.JobSubmitter: number of splits:38
16/07/18 23:20:35 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1468909152215_0001
16/07/18 23:20:35 INFO impl.YarnClientImpl: Submitted application application_1468909152215_0001
16/07/18 23:20:36 INFO mapreduce.Job: The url to track the job: http://apex69:8088/proxy/application_1468909152215_0001/
16/07/18 23:20:36 INFO mapreduce.Job: Running job: job_1468909152215_0001
16/07/18 23:20:46 INFO mapreduce.Job: Job job_1468909152215_0001 running in uber mode : false
16/07/18 23:20:46 INFO mapreduce.Job: map 0% reduce 0%
16/07/18 23:21:01 INFO mapreduce.Job: map 13% reduce 0%
16/07/18 23:21:02 INFO mapreduce.Job: map 16% reduce 0%
16/07/18 23:21:03 INFO mapreduce.Job: map 19% reduce 0%
16/07/18 23:21:07 INFO mapreduce.Job: map 21% reduce 0%
16/07/18 23:21:08 INFO mapreduce.Job: map 22% reduce 0%
16/07/18 23:21:10 INFO mapreduce.Job: map 27% reduce 0%
16/07/18 23:21:13 INFO mapreduce.Job: map 33% reduce 0%
16/07/18 23:21:17 INFO mapreduce.Job: map 34% reduce 0%
16/07/18 23:21:20 INFO mapreduce.Job: map 42% reduce 0%
16/07/18 23:21:22 INFO mapreduce.Job: map 55% reduce 0%
16/07/18 23:21:23 INFO mapreduce.Job: map 62% reduce 0%
16/07/18 23:21:24 INFO mapreduce.Job: map 67% reduce 0%
16/07/18 23:21:25 INFO mapreduce.Job: map 80% reduce 0%
16/07/18 23:21:26 INFO mapreduce.Job: map 82% reduce 0%
16/07/18 23:21:28 INFO mapreduce.Job: map 87% reduce 0%
16/07/18 23:21:29 INFO mapreduce.Job: map 95% reduce 0%
16/07/18 23:21:30 INFO mapreduce.Job: map 97% reduce 0%
16/07/18 23:21:31 INFO mapreduce.Job: map 100% reduce 0%
16/07/18 23:22:30 INFO mapreduce.Job: Job job_1468909152215_0001 completed successfully
16/07/18 23:22:30 INFO mapreduce.Job: Counters: 31
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=4536430
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=3252
HDFS: Number of bytes written=5000000000
HDFS: Number of read operations=152
HDFS: Number of large read operations=0
HDFS: Number of write operations=76
Job Counters
Launched map tasks=38
Other local map tasks=38
Total time spent by all maps in occupied slots (ms)=5960619
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=1986873
Total vcore-milliseconds taken by all map tasks=1986873
Total megabyte-milliseconds taken by all map tasks=5086394880
Map-Reduce Framework
Map input records=50000000
Map output records=50000000
Input split bytes=3252
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=1115
CPU time spent (ms)=145580
Physical memory (bytes) snapshot=14764904448
Virtual memory (bytes) snapshot=103846907904
Total committed heap usage (bytes)=38351667200
org.apache.hadoop.examples.terasort.TeraGen$Counters
CHECKSUM=107387891658806101
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=5000000000
6) The job then runs Terasort
*******************************************************
* Executing TeraSort
*******************************************************
Running bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar terasort -Dmapreduce.job.reduces=16 -Ddfs.datanode.drop.cache.behind.reads=true -Ddfs.datanode.drop.cache.behind.writes=true terasort-teragen terasort-sort
16/07/18 23:23:01 INFO terasort.TeraSort: starting
16/07/18 23:23:01 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/07/18 23:23:02 INFO input.FileInputFormat: Total input paths to process : 38
Spent 170ms computing base-splits.
Spent 3ms computing TeraScheduler splits.
Computing input splits took 173ms
Sampling 10 splits of 38
Making 16 from 100000 sampled records
Computing parititions took 4961ms
Spent 5137ms computing partitions.
16/07/18 23:23:07 INFO client.RMProxy: Connecting to ResourceManager at apex69/192.168.123.69:8032
16/07/18 23:23:21 INFO mapreduce.JobSubmitter: number of splits:38
16/07/18 23:23:22 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1468909152215_0002
16/07/18 23:23:22 INFO impl.YarnClientImpl: Submitted application application_1468909152215_0002
16/07/18 23:23:22 INFO mapreduce.Job: The url to track the job: http://apex69:8088/proxy/application_1468909152215_0002/
16/07/18 23:23:22 INFO mapreduce.Job: Running job: job_1468909152215_0002
16/07/18 23:23:28 INFO mapreduce.Job: Job job_1468909152215_0002 running in uber mode : false
16/07/18 23:23:28 INFO mapreduce.Job: map 0% reduce 0%
16/07/18 23:23:41 INFO mapreduce.Job: map 16% reduce 0%
16/07/18 23:23:44 INFO mapreduce.Job: map 32% reduce 0%
16/07/18 23:23:45 INFO mapreduce.Job: map 96% reduce 0%
16/07/18 23:23:46 INFO mapreduce.Job: map 100% reduce 0%
16/07/18 23:23:53 INFO mapreduce.Job: map 100% reduce 25%
16/07/18 23:23:54 INFO mapreduce.Job: map 100% reduce 67%
16/07/18 23:23:56 INFO mapreduce.Job: map 100% reduce 69%
16/07/18 23:23:57 INFO mapreduce.Job: map 100% reduce 70%
16/07/18 23:23:59 INFO mapreduce.Job: map 100% reduce 73%
16/07/18 23:24:00 INFO mapreduce.Job: map 100% reduce 80%
16/07/18 23:24:02 INFO mapreduce.Job: map 100% reduce 83%
16/07/18 23:24:03 INFO mapreduce.Job: map 100% reduce 85%
16/07/18 23:24:05 INFO mapreduce.Job: map 100% reduce 88%
16/07/18 23:24:06 INFO mapreduce.Job: map 100% reduce 91%
16/07/18 23:24:08 INFO mapreduce.Job: map 100% reduce 92%
16/07/18 23:24:09 INFO mapreduce.Job: map 100% reduce 97%
16/07/18 23:24:11 INFO mapreduce.Job: map 100% reduce 98%
16/07/18 23:24:12 INFO mapreduce.Job: map 100% reduce 99%
16/07/18 23:24:15 INFO mapreduce.Job: map 100% reduce 100%
16/07/18 23:24:41 INFO mapreduce.Job: Job job_1468909152215_0002 completed successfully
16/07/18 23:24:42 INFO mapreduce.Job: Counters: 50
File System Counters
FILE: Number of bytes read=5200006366
FILE: Number of bytes written=10406540642
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=5000004712
HDFS: Number of bytes written=5000000000
HDFS: Number of read operations=162
HDFS: Number of large read operations=0
HDFS: Number of write operations=32
Job Counters
Launched map tasks=38
Launched reduce tasks=16
Data-local map tasks=26
Rack-local map tasks=12
Total time spent by all maps in occupied slots (ms)=1282143
Total time spent by all reduces in occupied slots (ms)=2726265
Total time spent by all map tasks (ms)=427381
Total time spent by all reduce tasks (ms)=545253
Total vcore-milliseconds taken by all map tasks=427381
Total vcore-milliseconds taken by all reduce tasks=545253
Total megabyte-milliseconds taken by all map tasks=1094095360
Total megabyte-milliseconds taken by all reduce tasks=2791695360
Map-Reduce Framework
Map input records=50000000
Map output records=50000000
Map output bytes=5100000000
Map output materialized bytes=5200003648
Input split bytes=4712
Combine input records=0
Combine output record=0
Reduce input groups=50000000
Reduce shuffle bytes=5200003648
Reduce input records=50000000
Reduce output records=50000000
Spilled Records=100000000
Shuffled Maps =608
Failed Shuffles=0
Merged Map outputs=608
GC time elapsed (ms)=4693
CPU time spent (ms)=440090
Physical memory (bytes) snapshot=40267489280
Virtual memory (bytes) snapshot=183285145600
Total committed heap usage (bytes)=63197675520
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=5000000000
File Output Format Counters
Bytes Written=5000000000
7) With the job complete, Magpie now tears down the session and cleans
up all daemons.
Stopping hadoop
stopping yarn daemons
stopping resourcemanager
apex71: stopping nodemanager
apex73: stopping nodemanager
apex70: stopping nodemanager
apex75: stopping nodemanager
apex76: stopping nodemanager
apex74: stopping nodemanager
apex72: stopping nodemanager
apex77: stopping nodemanager
no proxyserver to stop
stopping historyserver
Saving namespace before shutting down hdfs ...
Running bin/hdfs dfsadmin -safemode enter
16/07/18 23:31:21 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Safe mode is ON
Running bin/hdfs dfsadmin -saveNamespace
16/07/18 23:31:23 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Save namespace successful
Running bin/hdfs dfsadmin -safemode leave
16/07/18 23:31:36 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Safe mode is OFF
16/07/18 23:31:37 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Stopping namenodes on [apex69]
apex69: stopping namenode
apex73: stopping datanode
apex76: stopping datanode
apex74: stopping datanode
apex72: stopping datanode
apex77: stopping datanode
apex71: stopping datanode
apex70: stopping datanode
apex75: stopping datanode
Stopping secondary namenodes [apex69]
apex69: stopping secondarynamenode
Hadoop Patching
---------------
Hadoop
- Patch to support non-ssh remote execution may be needed in some
environments. Patch can be applied directly to startup scripts, not
needing a recompilation of source.
Patches for this can be found in the patches/hadoop/ directory with
'alternate-ssh' in the filename.
The alternate remote execution command must be specified in the
environment variable MAGPIE_REMOTE_CMD.
This patch should be applied first.
- Hadoop terasort may require recompilation if running over 'rawnetworkfs'.
There is a bug in the Terasort example, leading to issues with
running Terasort against a parallel file system directly. I submitted
a patch in this Jira.
https://issues.apache.org/jira/browse/MAPREDUCE-5528
The patch to correct this is included in the patches/hadoop/ directory.
- If MAGPIE_NO_LOCAL_DIR support is desired, patches in
patches/hadoop/ with the "no-local-dir.patch" suffix in the filename
can be found for support. See README.no-local-dir for more details.
This patch should be applied second, after 'alternate-ssh' if one is
available.
Basics of HDFS over Lustre/NetworkFS
------------------------------------
Instead of using local disk, we designate a Lustre/network file system
directory to "emulate" local disk for each compute node. For example,
lets say you have 4 compute nodes. If we create the following paths
in Lustre,
/lustre/myusername/node-0
/lustre/myusername/node-1
/lustre/myusername/node-2
/lustre/myusername/node-3
We can give each of these paths to one of the compute nodes, which
they can treat like a local disk. HDFS operates on top of these
directories just as though there were a local disk on the server.
Q: Does that mean I have to HDFS starts with a clean slate everytime I
start a job?
A: No, using node ranks, "disk-paths" can be consistently assigned to
nodes so that all your HDFS files from a previous run can exist on
a later run. The next time you run your job, it doesn't matter
what server you're running on, b/c your scheduler/resource manager
will assign that node its appropriate rank. The node will
subsequently load HDFS from its appropriate directory.
Q: But that'll mean I have to consistently use the same number of
cluster nodes?
A: Generally speaking no, but you can hit issues if you don't. Just
imagine what HDFS issues if you were on a traditional Hadoop
cluster and added or removed nodes.
Generally speaking, increasing the number of nodes you use for a
job is fine. Data you currently have in HDFS is still there and
readable, but it is not viewed as "local" according to HDFS and
more network transfers will have to happen. You may wish to
rebalance the HDFS blocks though. The convenience script
hadoop-rebalance-hdfs-over-lustre-or-hdfs-over-networkfs-if-increasing-nodes-script.sh
be used instead.
(Special Note: The start-balancer.sh that is
normally used probably will not work. All of the paths are in
Lustre/NetworkFS, so the "free space" on each "local" path is identical,
messing up calculations for balancing (i.e. no "local disk" is
more/less utilized than another).
Decreasing nodes is a bit more dangerous, as data can "disappear"
just like if it were on a traditional Hadoop cluster. If you try
to scale down the number of nodes, you should go through the
process of "decommissioning" nodes like on a real cluster,
otherwise you may lose data. You can decommission nodes through
the "decommissionhdfsnodes" option in HADOOP_MODE.
Q: Can multiple Magpie instances be run in parallel based on the same
lustre/networkfs path?
A: No, only one HDFS namenode can operate out of a specific set of
paths. If you imagine a traditional Hadoop cluster, what you
effectively are trying to do is start two HDFS file systems out of
the same local disk.
If you wish to run multiple Magpie instances with HDFS over Lustre
or HDFS over a NetworkFS, all you need to do is configure different
base directories for those instances, via the
HADOOP_HDFSOVERLUSTRE_PATH or HADOOP_HDFSOVERNETWORKFS_PATH
settings respectively.
If you are not concerned about data persisting between jobs, you
may also consider using the MAGPIE_ONE_TIME_RUN option to always
force HDFS paths to be different for each job. This setting may be
particularly useful if you initially running tests/experiments on
CPU counts, node counts, settings, etc. and want to run many jobs
in parallel.
Q: What should HDFS replication be?
A: The scripts in this package default to HDFS replication of 3 when
HDFS over Lustre is done. If HDFS replication is > 1, it can
improve performance of your job reading in HDFS input b/c there
will be fewer network transfer of data (i.e. Hadoop may need to
transfer "non-local" data to another node). In addition, if a
datanode were to die (i.e. a node crashes) Hadoop has the ability
to survive the crash just like in a traditional Hadoop cluster.
The trade-off is space and HDFS writes vs HDFS reads. With lower
HDFS replication (lowest is 1) you save space and decrease time for
writes. With increased HDFS replication, you perform better on
reads.
Q: What if I need to upgrade the HDFS version I'm using.
A: If you want to use a different Hadoop version than what you started
with, you will have to go through the normal upgrade or rollback
precedures for Hadoop.
With Hadoop versions 2.2.0 and newer, there is a seemless upgrade
path done by specifying "-upgrade" when running the "start-dfs.sh"
script. This is implemented in the "upgradehdfs" option for
HADOOP_MODE in the launch scripts.
Pro vs Con of HDFS over Lustre/NetworkFS vs. Posix FS (e.g. rawnetworkfs, etc.)
-------------------------------------------------------------------------------
Here are some pros vs. cons of using a network filesystem directly vs
HDFS over Lustre/NetworkFS.
HDFS over Lustre/NetworkFS:
Pro: Portability w/ code that runs against a "traditional" Hadoop
cluster. If it runs on a "traditional" Hadoop cluster w/ local disk,
it should run fine w/ HDFS over Lustre/NetworkFS.
Con: Must always run job w/ Hadoop & HDFS running as a job.
Con: Must "import" and "export" data from HDFS using job runs, cannot
read/write directly. On some clusters, this may involve a double copy
of data. e.g. first need to copy data into the cluster, then run job to
copy data into HDFS over Lustre/NetworkFS.
Con: Possible difficulty changing job size on clusters.
Con: If HDFS replication > 1, more space used up.
Posix FS directly:
Pro: You can read/write files to Lustre without Hadoop/HDFS running.
Pro: Less space used up.
Pro: Can adjust job size easily.
Con: Portability issues w/ code that usually runs on HDFS. As an
example, HDFS has no concept of a working directory while Posix
filesystems do. In addition, absolute paths will be different. Code
will have to be adjusted for this.
Con: User must "manage" and organize their files directly, not gaining
the block advantages of HDFS. If not handled well, this can lead to
performance issues. For example, a Hadoop job that creates a 1
terabyte file under HDFS is creating a file made up of smaller HDFS
blocks. The same job may create a single 1 terabyte file under access
to the Posix FS directly. In the case of Lustre, striping of the file
must be handled by the user to ensure satisfactory performance.
Hadoop Troubleshooting
----------------------
1) What does "Waiting 30 more seconds for namenode to exit safe mode"
mean?
When HDFS (regardless if it is HDFS over Lustre or otherwise) is
being brought up, the HDFS namenode must communicate with all
datanodes to discover what blocks of data are available. This
process can take awhile if you have a very large amount of data in
HDFS.
In rarer circumstances, it can be due to a bug/error in which the
HDFS namenode has not come up at all (i.e. the HDFS namenode isn't
running) or a large number of datanodes have not come up due to
errors. Check the appropriate log files to debug further.
Hadoop Advanced Usage
---------------------
1) If your cluster has a local SSD or NVRAM on each node, set a path
to it via the HADOOP_LOCALSTORE environment variable in your
submission scripts. It will allow Hadoop to store intermediate
shuffle data to it. It should significantly improve performance.
2) Magpie configures the default number of reducers in a Hadoop job to
the number of compute nodes in your allocation. This is
significantly superior to the Hadoop default of 1, however, it may
not be optimal for many jobs. Users should play around with the
number of reducers in their mapreduce jobs to improve performance.
The default can be tweaked in the submission scripts via the
HADOOP_DEFAULT_REDUCE_TASKS environment variable.
3) Similarly Magpie configures the default number of map tasks in a
Hadoop job depending on the number of compute nodes in your
allocation and the number of cores on nodes. However, this may not
be optimal in many situations, such as nodes that have a low memory
to cpu ratio. Users should play around with the number of map
tasks in their mapreduce jobs to improve performance. The default
can be tweaked in the submission scripts via the
HADOOP_DEFAULT_MAP_TASKS environment variable.
4) Magpie configures a relatively conservative amount of memory for
Hadoop, currently 80% of system memory. While there should always
be a buffer to allow the operating system, system daemons, and
Hadoop daemons to operate, the 80% value may be on the conservative
side and users wishing to push it higher to 90% or 95% of system
memory may see benefits..
Users can adjust the amount of memory used on each node through the
YARN_RESOURCE_MEMORY environment variable in the submission
scripts. (Note, this is only for Hadoop 2.0 and up)
Users can also turn off memory checking for YARN containers with
YARN_VMEM_CHECK and YARN_PMEM_CHECK environment variables in the
submission scripts.
5) Depending on whether your job is map heavy or reduce heavy,
adjusting the memory container sizes of map and reduce tasks may
also benefit job performance. These adjustments could lead to
larger buffer sizes on individual tasks if memory sizes are
increased or allow more tasks to be run in parallel if memory size
is decreased.
This can be adjusted through the HADOOP_CHILD_HEAPSIZE,
HADOOP_CHILD_MAP_HEAPSIZE, and HADOOP_CHILD_REDUCE_HEAPSIZE
environment variables in the submission scripts.
6) The mapreduce slowstart configuration determines the percentage of
map tasks that must complete before reducers begin. It defaults to
a very conservative value of 0.05 (i.e. 5%). This will be
non-optimal for many jobs, including jobs that have relatively
non-computationally heavy reduce tasks. The reduce tasks will take
up job resources (task slots, memory, etc.) that might be otherwise
useful for map tasks. For many users, they may wish to play with
this using higher percentages. The default can be tweaked in the
submission scripts via the HADOOP_MAPREDUCE_SLOWSTART environment
variable.
7) By default Magpie disables compression in Hadoop. On some jobs, if
the data is particularly large, the time spent
compressing/decompressing data may be beneficial to job
performance. Compression can be enabled through the
HADOOP_COMPRESSION environment variable in the submission scripts.
8) By default Magpie runs as the user who submitted the job. In the event
that other users may need to use the same Yarn cluster you can allow
individual users or groups by manually adding the following to your
Magpie script:
YARN_QUEUES_EXTRA_USERS : Allows extra system users to be added to
the default yarn queue in order to run jobs.
YARN_QUEUES_EXTRA_GROUPS : Allows extra system groups to be added to
the default yarn queue in order to run jobs.