Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the option to run a slave daemon on the master node. #7

Closed
wants to merge 5 commits into from

Conversation

cmd-ntrf
Copy link
Contributor

@cmd-ntrf cmd-ntrf commented Jan 6, 2015

While testing some scenarios with Spark on our cluster, I realized that most of the time the master node was idle.

I propose to add a parameter, for now called MAGPIE_MASTER_IS_SLAVE, that when set to "yes" adds the master to the list of slaves. By default, this option is off.

The parameter could be renamed to something more meaningful. This is what came first to my mind. I supposed that behaviour is not proper to Spark, therefore the choice of prefix (MAGPIE).

@chu11
Copy link
Member

chu11 commented Jan 6, 2015

This seems like a good idea. Question though. Have you only tried this with Spark? I'm wondering how it would work with other stuff. The Hadoop HDFS setup is one I'm wondering if will work, although I believe it would.

Instead of "MAGPIE_MASTER_IS_SLAVE" how about "MAGPIE_RUN_SLAVE_ON_MASTER", which (atleast to me) sounds more clear about what the option is doing?

Would also like to see a more clear description in the comments. How about something like

"By default, the master node will only run master daemons, such as the Hadoop NameNode, Hbase Master, Spark Master, etc.

On certain workloads the amount of memory and processing used on the master may be low enough it would be worthwhile to also run slave node daemons on the master (e.g. Hadoop DataNode, Hbase RegionServer, Spark Worker, etc.). By setting the below to yes, you will inform Magpie to also run slave node daemons on the master. If enabled, you are advised to adjust heaps and memory usage of all necessary daemons to ensure that resources are allocated appropriately to ensure decent performance.

By default, this is turned off."

I dunno. That's a start. Feel free to adjust as needed.

Also, could the option be before the MAGPIE_REMOTE_CMD/MAGPIE_REMOTE_CMD_OPTS. I'd like to keep those as the last options within that section.

Thanks

@cmd-ntrf
Copy link
Contributor Author

cmd-ntrf commented Jan 6, 2015

I have updated the variable name and the documentation based on your suggestion.

I have yet tried the parameter effect on something else than Spark. I wanted to submit you the idea with a working patch to test the water and see if there was something I had missed before doing further testing.

Do you have suggestion other than simply using the different plugin test example to make sure everything works properly with MAGPIE_RUN_SLAVE_ON_MASTER=yes? I recommend we leave this request unmerged until I get the results. I will keep you updated.

@chu11
Copy link
Member

chu11 commented Jan 6, 2015

I think all the plugin tests are a good start. Atleast it's a good sanity test if things work. If Hadoop terasort works w/ hdfsoverlustre (or any network file system), that's high confidence things are good.

Also realized we may need to tweak the hdfs federation code in magpie-setup-core just below your changes.

I think all we'd have to do is put an if statement around

tail -n+${HDFS_FEDERATION_NAMENODE_COUNT} ${MAGPIE_LOCAL_DIR}/tmp_slaves > ${MAGPIE_LOCAL_DIR}/tmp_slaves_tmp

and not do the above if MAGPIE_RUN_SLAVE_ON_MASTER = yes.

@cmd-ntrf
Copy link
Contributor Author

cmd-ntrf commented Jan 8, 2015

Hadoop terasort with hdfsoverlustre is working with MASTER_RUN_SLAVE_ON_MASTER="yes". I have yet tested the other plugins, excepting Spark.

@chu11 chu11 force-pushed the master branch 3 times, most recently from 558455c to 06da05d Compare June 2, 2015 18:34
@cmd-ntrf cmd-ntrf closed this May 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants