Skip to content

Commit

Permalink
HDFS-6712. Document HDFS Multihoming Settings. (Contributed by Arpit …
Browse files Browse the repository at this point in the history
…Agarwal)

git-svn-id: https://svn.apache.org/repos/asf/hadoop/common/trunk@1612695 13f79535-47bb-0310-9956-ffa450edef68
  • Loading branch information
arp7 committed Jul 22, 2014
1 parent 60b1e83 commit fee737e
Show file tree
Hide file tree
Showing 3 changed files with 148 additions and 0 deletions.
2 changes: 2 additions & 0 deletions hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
Original file line number Diff line number Diff line change
Expand Up @@ -602,6 +602,8 @@ Release 2.5.0 - UNRELEASED
HDFS-6680. BlockPlacementPolicyDefault does not choose favored nodes
correctly. (szetszwo)

HDFS-6712. Document HDFS Multihoming Settings. (Arpit Agarwal)

OPTIMIZATIONS

HDFS-6214. Webhdfs has poor throughput for files >2GB (daryn)
Expand Down
145 changes: 145 additions & 0 deletions hadoop-hdfs-project/hadoop-hdfs/src/site/apt/HdfsMultihoming.apt.vm
Original file line number Diff line number Diff line change
@@ -0,0 +1,145 @@
~~ Licensed under the Apache License, Version 2.0 (the "License");
~~ you may not use this file except in compliance with the License.
~~ You may obtain a copy of the License at
~~
~~ http://www.apache.org/licenses/LICENSE-2.0
~~
~~ Unless required by applicable law or agreed to in writing, software
~~ distributed under the License is distributed on an "AS IS" BASIS,
~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
~~ See the License for the specific language governing permissions and
~~ limitations under the License. See accompanying LICENSE file.

---
Hadoop Distributed File System-${project.version} - Support for Multi-Homed Networks
---
---
${maven.build.timestamp}

HDFS Support for Multihomed Networks

This document is targetted to cluster administrators deploying <<<HDFS>>> in
multihomed networks. Similar support for <<<YARN>>>/<<<MapReduce>>> is
work in progress and will be documented when available.

%{toc|section=1|fromDepth=0}

* Multihoming Background

In multihomed networks the cluster nodes are connected to more than one
network interface. There could be multiple reasons for doing so.

[[1]] <<Security>>: Security requirements may dictate that intra-cluster
traffic be confined to a different network than the network used to
transfer data in and out of the cluster.

[[2]] <<Performance>>: Intra-cluster traffic may use one or more high bandwidth
interconnects like Fiber Channel, Infiniband or 10GbE.

[[3]] <<Failover/Redundancy>>: The nodes may have multiple network adapters
connected to a single network to handle network adapter failure.


Note that NIC Bonding (also known as NIC Teaming or Link
Aggregation) is a related but separate topic. The following settings
are usually not applicable to a NIC bonding configuration which handles
multiplexing and failover transparently while presenting a single 'logical
network' to applications.

* Fixing Hadoop Issues In Multihomed Environments

** Ensuring HDFS Daemons Bind All Interfaces

By default <<<HDFS>>> endpoints are specified as either hostnames or IP addresses.
In either case <<<HDFS>>> daemons will bind to a single IP address making
the daemons unreachable from other networks.

The solution is to have separate setting for server endpoints to force binding
the wildcard IP address <<<INADDR_ANY>>> i.e. <<<0.0.0.0>>>. Do NOT supply a port
number with any of these settings.

----
<property>
<name>dfs.namenode.rpc-bind-host</name>
<value>0.0.0.0</value>
<description>
The actual address the RPC server will bind to. If this optional address is
set, it overrides only the hostname portion of dfs.namenode.rpc-address.
It can also be specified per name node or name service for HA/Federation.
This is useful for making the name node listen on all interfaces by
setting it to 0.0.0.0.
</description>
</property>

<property>
<name>dfs.namenode.servicerpc-bind-host</name>
<value>0.0.0.0</value>
<description>
The actual address the service RPC server will bind to. If this optional address is
set, it overrides only the hostname portion of dfs.namenode.servicerpc-address.
It can also be specified per name node or name service for HA/Federation.
This is useful for making the name node listen on all interfaces by
setting it to 0.0.0.0.
</description>
</property>

<property>
<name>dfs.namenode.http-bind-host</name>
<value>0.0.0.0</value>
<description>
The actual adress the HTTP server will bind to. If this optional address
is set, it overrides only the hostname portion of dfs.namenode.http-address.
It can also be specified per name node or name service for HA/Federation.
This is useful for making the name node HTTP server listen on all
interfaces by setting it to 0.0.0.0.
</description>
</property>

<property>
<name>dfs.namenode.https-bind-host</name>
<value>0.0.0.0</value>
<description>
The actual adress the HTTPS server will bind to. If this optional address
is set, it overrides only the hostname portion of dfs.namenode.https-address.
It can also be specified per name node or name service for HA/Federation.
This is useful for making the name node HTTPS server listen on all
interfaces by setting it to 0.0.0.0.
</description>
</property>
----

** Clients use Hostnames when connecting to DataNodes

By default <<<HDFS>>> clients connect to DataNodes using the IP address
provided by the NameNode. Depending on the network configuration this
IP address may be unreachable by the clients. The fix is letting clients perform
their own DNS resolution of the DataNode hostname. The following setting
enables this behavior.

----
<property>
<name>dfs.client.use.datanode.hostname</name>
<value>true</value>
<description>Whether clients should use datanode hostnames when
connecting to datanodes.
</description>
</property>
----

** DataNodes use HostNames when connecting to other DataNodes

Rarely, the NameNode-resolved IP address for a DataNode may be unreachable
from other DataNodes. The fix is to force DataNodes to perform their own
DNS resolution for inter-DataNode connections. The following setting enables
this behavior.

----
<property>
<name>dfs.datanode.use.datanode.hostname</name>
<value>true</value>
<description>Whether datanodes should use datanode hostnames when
connecting to other datanodes for data transfer.
</description>
</property>
----

1 change: 1 addition & 0 deletions hadoop-project/src/site/site.xml
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,7 @@
<item name="HDFS NFS Gateway" href="hadoop-project-dist/hadoop-hdfs/HdfsNfsGateway.html"/>
<item name="HDFS Rolling Upgrade" href="hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html"/>
<item name="Extended Attributes" href="hadoop-project-dist/hadoop-hdfs/ExtendedAttributes.html"/>
<item name="HDFS Support for Multihoming" href="hadoop-project-dist/hadoop-hdfs/HdfsMultihoming.html"/>
</menu>

<menu name="MapReduce" inherit="top">
Expand Down

0 comments on commit fee737e

Please sign in to comment.