-
Notifications
You must be signed in to change notification settings - Fork 51
/
Copy pathREADME.faq
126 lines (93 loc) · 5.18 KB
/
README.faq
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
FAQ
---
* Why was Magpie written so users modify a job submission batch file
instead of using a command line tool to automatically allocate nodes
and setup daemons for you?
This is something I've been asked about a few times because it may
make Magpie more difficult for some users and some other Magpie-like
tools out there are command line tools.
The biggest reason is cultural to Lawrence Livermore. Historically
there have been multiple OSes, file systems, hardware, etc. on all
of the clusters here. So our users are taught to use our systems by
learning how to write batch files for job submission and not taught
to use a specific command line tool to automatically do things.
So by extension, Magpie offers pre-made batch scripts for users to
modify and submit.
I believe other locations have a much lower permutation of systems
and/or situations, which make a command line tool more feasible.
* Will you write a command line tool for Magpie someday?
Maybe, it's not a high priority.
* Can Magpie work with Big Data distros, such as those from Cloudera
and Hortonworks?
I've never tested with them, so I don't know. It could if all the
same files/scripts from the Apache releases are still there.
* Why does Magpie work against Apache releases of Hadoop, Spark,
etc. but not necessarily those from Cloudera and Hortonworks?
The reason is for somewhat legacy reasons. In the original scripts
I wrote to support Hadoop (in a set of house made scripts before
Magpie), I utilized the Hadoop scripts 'start-all.sh',
'start-dfs.sh', etc. to start and stop daemons on all nodes of an
allocation. These scripts had to be modified/patched, but the core
of the scripts was unchanged.
Similar scripts also existed in Hbase, Spark, and other projects.
When looking at a Cloudera distribution, I noticed that these
scripts were removed from their distribution in favor of system
scripts out of /etc/init.d.
Since scripts out of /etc/init.d require root, I stuck to the
'start-all.sh' + etc. scripts for starting/stopping daemons and
carried it forward.
I don't know if those scripts are distributed in newer versions of
Cloudera and/or Hortonworks.
* How did Magpie come to be?
Truth be told. Early on in some Hadoop investigations, we were
investigating many ideas about how HPC + Hadoop could be integrated.
Included were how to integrate Lustre into Hadoop, Infiniband into
Hadoop, and there were other ideas for down the road. Various
experimental patches were created and plugins/modules from others
were experimented with.
When presenting performance numbers and results from various
experiments, people at the meetings would ask me "How did you run
this experiment on cluster FOO?" I would respond with, "Well I
have these hacked up bash scripts ...".
After the 4th or 5th person asked if they could try out my "hacked
up bash scripts", I decided that perhaps they should be put together
into something far more formal. Later Pig, Hbase, Spark,
etc. support was added.
* Why is the project called Magpie?
Based on David Buttler's initial reply to my request for name
suggestions.
"Its Hadoop, so it should be an animal. It runs on Lustre, which is
shiny. How about Magpie?"
For those unaware, legend has it magpie birds like shiny objects.
* Why aren't all project combinations supported? For example, Spark
1.5.0 has a build against Hadoop 2.4 and Phoenix 4.5.0 has a build
against Hbase 0.98 on their official websites.
Magpie's primary attempt is to work with the official binary builds
distributed by projects on their official websites. Unfortunately,
different projects have build/release differences in their binaries.
For example, I believe most Hbase 0.98 binary versions distributed
via the official Hbase website were compiled for Java 1.6 while
newer Phoenix versions were compiled for Java 1.7. Even though
there exists a Phoenix 4.5.0 that is ABI compatible with Hbase 0.98,
it may not be binary compatible to Hbase 0.98 versions released by
the Hbase website. A re-compile of Hbase 0.98 would be required by
the user.
This issue has been found in several circumstances.
So the somewhat shorter answer is that Magpie "should" work with all
of these combinations, but I don't list it because I don't have
tests for it in the testsuite and it would require more work.
* How did Magpie begin?
Initially, the primary work goal was to see how Hadoop performed on
Lustre. A Lustre plugin for Hadoop was developed, a Lustre plugin
from the folks at Intel was tried out, and some other
experimentation was done. In order to test things, we needed a way
launch Hadoop on HPC clusters, so we heavily hacked/modified some
bash scripts originally developed by Kevin Regimbal @ PNNL.
Lo and behold, most people didn't care about the original work.
Most were interested in the bash scripts for starting up Hadoop in a
Slurm allocation. So a more polished set of scripts was developed,
which is Magpie. Eventually it was expanded to support Pig, Hbase,
Spark, and other projects.
* I have a question, can you help?
Please post questions to Github issue tracker. I'm glad to answer
questions.