Skip to content

Commit

Permalink
Fixed syntax for readability
Browse files Browse the repository at this point in the history
  • Loading branch information
jessica0xdata committed Oct 23, 2014
1 parent d194373 commit 4e6cd26
Showing 1 changed file with 44 additions and 31 deletions.
75 changes: 44 additions & 31 deletions h2o-docs/source/faq/hadoop_related.rst
Original file line number Diff line number Diff line change
Expand Up @@ -43,28 +43,35 @@ description of your Hadoop environment, including the Hadoop distribution and ve

Common Hadoop Questions
""""""""""""""""""""""""
*What's the syntax for the file path of a data set sitting in hdfs?*
**What's the syntax for the file path of a data set sitting in hdfs?**

To locate an HDFS file, go to **Data > Import** and enter **hdfs://** in the **path** field. H\ :sub:`2`\ O automatically detects any HDFS paths. This is a good way to verify the path to your data set before importing through R or any other non-web API.

*When interacting with an H2O cluster launched on multiple Hadoop nodes, is it necessary for R to be installed on all the data nodes?*
---

**When interacting with an H\ :sub:`2`\ O cluster launched on multiple Hadoop nodes, is it necessary for R to be installed on all the data nodes?**

No - as long as the R instance can communicate with one of the nodes in the network, R can be installed on any of the nodes, or even on a local machine that will securely tunnel into the cluster.

*Is it possible to launch the H2O cluster on Hadoop nodes using R’s h2o.init() command?*
---

**Is it possible to launch the H\ :sub:`2`\ O cluster on Hadoop nodes using R’s** `h2o.init()` **command?**

No - follow the instructions in :ref:`Hadoop_Tutorial` and add the IP address to the `h2o.init()` function to connect to the cluster.

No - follow the instructions in :ref:`Hadoop_Tutorial` and add the IP address to the **h2o.init()** function to connect to the cluster.
---

*What does "ERROR: Output directory hdfs://sandbox.hortonworks.com:8020/user/root/hdfsOutputDir already exists?" mean?*
**What does** `"ERROR: Output directory hdfs://sandbox.hortonworks.com:8020/user/root/hdfsOutputDir already exists?"` **mean?**

Each mapper task gets its own output directory in HDFS. To prevent overwriting multiple users' files, each mapper task must have a unique output directory name. Change
the **-output hdfsOutputDir** argument to **-output hdfsOutputDir1** and the task should launch.
the `-output hdfsOutputDir` argument to `-output hdfsOutputDir1` and the task should launch.

*What should I do if H2O starts to launch but times out in 120 seconds?*
---

**What should I do if H\ :sub:`2`\ O starts to launch but times out in 120 seconds?**

1.YARN or MapReduce's configuration is not configured correctly. Enable launching for mapper tasks of specified memory sizes. If YARN only allows mapper tasks with a maximum memory size of 1g and the request requires 2g, then the request will timeout at the default of 120 seconds. Read `Configuration Setup <http://hortonworks.com/blog/how-to-plan-and-configure-yarn-in-hdp-2-0/>`_ to make sure your setup will run.

1. YARN or MapReduce's configuration is not configured correctly. Enable launching for mapper tasks of specified memory sizes. If YARN only allows mapper tasks with a maximum memory size of 1g and the request requires 2g, then the request will timeout at the default of 120 seconds. Read `Configuration Setup <http://hortonworks.com/blog/how-to-plan-and-configure-yarn-in-hdp-2-0/>`_ to make sure your setup will run.

2. The nodes are not communicating with each other. If you request a cluster of two nodes and the output shows a stall in reporting the other nodes and forming a cluster (as shown in the following example), check that the security settings for the network connection between the two nodes are not preventing the nodes from communicating with each other. You should also check to make sure that the flatfile that is generated and being passed has the correct home address; if there are multiple local IP addresses, this could be an issue.

Expand Down Expand Up @@ -94,17 +101,21 @@ the **-output hdfsOutputDir** argument to **-output hdfsOutputDir1** and the tas
[Sending flatfile to node 10.115.5.25:54321]
H2O node 10.115.57.45:54321 reports H2O cluster size 1
H2O node 10.115.5.25:54321 reports H2O cluster size 1
---

*What should I do if the H2O job launches but terminates after 600 seconds?*
**What should I do if the H2O job launches but terminates after 600 seconds?**

The likely cause is a driver mismatch - check to make sure the Hadoop distribution matches the driver jar file used to launch H\ :sub:`2`\ O. If your distribution is not currently
available in the package, `email us <support@0xdata>`_ for a new driver file.

*What should I do if I want to create a job with a bigger heap size but YARN doesn't launch and H2O times out?*
---

First, try the job again but with a smaller heap size (**-mapperXmx**) and a smaller number of nodes (**-nodes**) to verify that a small launch can proceed at all.
**What should I do if I want to create a job with a bigger heap size but YARN doesn't launch and H\ :sub:`2`\ O times out?**

If the cluster manager settings are configured for the default maximum memory size but the memory required for the request exceeds that amount, YARN will not launch and H2O will time out.
First, try the job again but with a smaller heap size (`-mapperXmx`) and a smaller number of nodes (`-nodes`) to verify that a small launch can proceed at all.

If the cluster manager settings are configured for the default maximum memory size but the memory required for the request exceeds that amount, YARN will not launch and H\ :sub:`2`\ O will time out.
If you have a default configuration, change the configuration settings in your cluster manager to enable launching of mapper tasks for specific memory sizes. Use the following formula to calculate the amount of memory required:

::
Expand Down Expand Up @@ -134,51 +145,53 @@ Output from an H2O launch is shown below:
mapreduce.map.memory.mb: 36864


**mapreduce.map.memory.mb** must be less than the YARN memory configuration values for the launch to succeed. See the examples below for how to change the memory configuration values for your version of Hadoop.
`mapreduce.map.memory.mb` must be less than the YARN memory configuration values for the launch to succeed. See the examples below for how to change the memory configuration values for your version of Hadoop.


**For Cloudera, configure the settings in Cloudera Manager. Depending on how the cluster is configured, you may need to change the settings for more than one role group.**

1. Click **Configuration** and enter the following search term in quotes: **yarn.nodemanager.resource.memory-mb**.
1. Click **Configuration** and enter the following search term in quotes: **yarn.nodemanager.resource.memory-mb**.

2. Enter the amount of memory (in GB) to allocate in the **Value** field. If more than one group is listed, change the values for all listed groups.
2. Enter the amount of memory (in GB) to allocate in the **Value** field. If more than one group is listed, change the values for all listed groups.

.. image:: TroubleshootingHadoopClouderayarnnodemgr.png
:width: 100 %

3. Click the **Save Changes** button in the upper-right corner.
4. Enter the following search term in quotes: **yarn.scheduler.maximum-allocation-mb**
5. Change the value, click the **Save Changes** button in the upper-right corner, and redeploy.
3. Click the **Save Changes** button in the upper-right corner.
4. Enter the following search term in quotes: **yarn.scheduler.maximum-allocation-mb**
5. Change the value, click the **Save Changes** button in the upper-right corner, and redeploy.

.. image:: TroubleshootingHadoopClouderayarnscheduler.png
:width: 100%


For Hortonworks, `configure <http://docs.hortonworks.com/HDPDocuments/Ambari-1.6.0.0/bk_Monitoring_Hadoop_Book/content/monitor-chap2-3-3_2x.html>`_ the settings in Ambari.
1. Select **YARN**, then click the **Configs** tab.
2. Select the group.
3. In the **Node Manager** section, enter the amount of memory (in MB) to allocate in the **yarn.nodemanager.resource.memory-mb** entry field.
**For Hortonworks,** `configure <http://docs.hortonworks.com/HDPDocuments/Ambari-1.6.0.0/bk_Monitoring_Hadoop_Book/content/monitor-chap2-3-3_2x.html>`_ **the settings in Ambari.**

1. Select **YARN**, then click the **Configs** tab.
2. Select the group.
3. In the **Node Manager** section, enter the amount of memory (in MB) to allocate in the **yarn.nodemanager.resource.memory-mb** entry field.

.. image:: TroubleshootingHadoopAmbariNodeMgr.png
:width: 100 %

4. In the **Scheduler** section, enter the amount of memory (in MB)to allocate in the **yarn.scheduler.maximum-allocation-mb** entry field.
4. In the **Scheduler** section, enter the amount of memory (in MB)to allocate in the **yarn.scheduler.maximum-allocation-mb** entry field.

.. image:: TroubleshootingHadoopAmbariyarnscheduler.png
:width: 100 %

5. Click the **Save** button at the bottom of the page and redeploy the cluster.
5. Click the **Save** button at the bottom of the page and redeploy the cluster.


For MapR:
1. Edit the **yarn-site.xml** file for the node running the ResourceManager.
2. Change the values for the **yarn.nodemanager.resource.memory-mb** and **yarn.scheduler.maximum-allocation-mb** properties.
3. Restart the ResourceManager and redeploy the cluster.
**For MapR:**

1. Edit the **yarn-site.xml** file for the node running the ResourceManager.
2. Change the values for the `yarn.nodemanager.resource.memory-mb` and `yarn.scheduler.maximum-allocation-mb` properties.
3. Restart the ResourceManager and redeploy the cluster.


To verify the values were changed, check the values for the following properties:

* **<name>yarn.nodemanager.resource.memory-mb</name>**
* **<name>yarn.scheduler.maximum-allocation-mb</name>**
- `<name>yarn.nodemanager.resource.memory-mb</name>`
- `<name>yarn.scheduler.maximum-allocation-mb</name>`

---

0 comments on commit 4e6cd26

Please sign in to comment.