Skip to content

Latest commit

 

History

History
336 lines (213 loc) · 16.7 KB

connect-on-premises-network.md

File metadata and controls

336 lines (213 loc) · 16.7 KB
title description author ms.reviewer ms.service ms.custom ms.topic ms.date ms.author
Connect HDInsight to your on-premises network - Azure HDInsight
Learn how to create an HDInsight cluster in an Azure Virtual Network, and then connect it to your on-premises network. Learn how to configure name resolution between HDInsight and your on-premises network by using a custom DNS server.
hrasheed-msft
jasonh
hdinsight
hdinsightactive
conceptual
02/23/2018
hrasheed

Connect HDInsight to your on-premises network

Learn how to connect HDInsight to your on-premises network by using Azure Virtual Networks and a VPN gateway. This document provides planning information on:

  • Using HDInsight in an Azure Virtual Network that connects to your on-premises network.

  • Configuring DNS name resolution between the virtual network and your on-premises network.

  • Configuring network security groups to restrict internet access to HDInsight.

  • Ports provided by HDInsight on the virtual network.

Create the Virtual network configuration

Use the following documents to learn how to create an Azure Virtual Network that is connected to your on-premises network:

Configure name resolution

To allow HDInsight and resources in the joined network to communicate by name, you must perform the following actions:

  • Create a custom DNS server in the Azure Virtual Network.

  • Configure the virtual network to use the custom DNS server instead of the default Azure Recursive Resolver.

  • Configure forwarding between the custom DNS server and your on-premises DNS server.

This configuration enables the following behavior:

  • Requests for fully qualified domain names that have the DNS suffix for the virtual network are forwarded to the custom DNS server. The custom DNS server then forwards these requests to the Azure Recursive Resolver, which returns the IP address.

  • All other requests are forwarded to the on-premises DNS server. Even requests for public internet resources such as microsoft.com are forwarded to the on-premises DNS server for name resolution.

In the following diagram, green lines are requests for resources that end in the DNS suffix of the virtual network. Blue lines are requests for resources in the on-premises network or on the public internet.

Diagram of how DNS requests are resolved in the configuration used in this document

Create a custom DNS server

Important

You must create and configure the DNS server before installing HDInsight into the virtual network.

To create a Linux VM that uses the Bind DNS software, use the following steps:

Note

The following steps use the Azure portal to create an Azure Virtual Machine. For other ways to create a virtual machine, see the following documents:

  1. From the Azure portal, select +, Compute, and Ubuntu Server 16.04 LTS.

    Create an Ubuntu virtual machine

  2. From the Basics section, enter the following information:

    • Name: A friendly name that identifies this virtual machine. For example, DNSProxy.
    • User name: The name of the SSH account.
    • SSH public key or Password: The authentication method for the SSH account. We recommend using public keys, as they are more secure. For more information, see the Create and use SSH keys for Linux VMs document.
    • Resource group: Select Use existing, and then select the resource group that contains the virtual network created earlier.
    • Location: Select the same location as the virtual network.

    Virtual machine basic configuration

    Leave other entries at the default values and then select OK.

  3. From the Choose a size section, select the VM size. For this tutorial, select the smallest and lowest cost option. To continue, use the Select button.

  4. From the Settings section, enter the following information:

    • Virtual network: Select the virtual network that you created earlier.

    • Subnet: Select the default subnet for the virtual network. Do not select the subnet used by the VPN gateway.

    • Diagnostics storage account: Either select an existing storage account or create a new one.

    Virtual network settings

    Leave the other entries at the default value, then select OK to continue.

  5. From the Purchase section, select the Purchase button to create the virtual machine.

  6. Once the virtual machine has been created, its Overview section is displayed. From the list on the left, select Properties. Save the Public IP address and Private IP address values. It will be used in the next section.

    Public and private IP addresses

Install and configure Bind (DNS software)

  1. Use SSH to connect to the public IP address of the virtual machine. The following example connects to a virtual machine at 40.68.254.142:

    Replace sshuser with the SSH user account you specified when creating the cluster.

    [!NOTE] There are a variety of ways to obtain the ssh utility. On Linux, Unix, and macOS, it is provided as part of the operating system. If you are using Windows, consider one of the following options:

  2. To install Bind, use the following commands from the SSH session:

    sudo apt-get update -y
    sudo apt-get install bind9 -y
  3. To configure Bind to forward name resolution requests to your on-prem DNS server, use the following text as the contents of the /etc/bind/named.conf.options file:

     acl goodclients {
     	10.0.0.0/16; # Replace with the IP address range of the virtual network
     	10.1.0.0/16; # Replace with the IP address range of the on-premises network
     	localhost;
     	localnets;
     };
    
     options {
     		directory "/var/cache/bind";
    
     		recursion yes;
    
     		allow-query { goodclients; };
    
     		forwarders {
     		192.168.0.1; # Replace with the IP address of the on-premises DNS server
     		};
    
     		dnssec-validation auto;
    
     		auth-nxdomain no;    # conform to RFC1035
     		listen-on { any; };
     };
    

    [!IMPORTANT] Replace the values in the goodclients section with the IP address range of the virtual network and on-premises network. This section defines the addresses that this DNS server accepts requests from.

    Replace the 192.168.0.1 entry in the forwarders section with the IP address of your on-premises DNS server. This entry routes DNS requests to your on-premises DNS server for resolution.

    To edit this file, use the following command:

    sudo nano /etc/bind/named.conf.options

    To save the file, use Ctrl+X, Y, and then Enter.

  4. From the SSH session, use the following command:

    hostname -f

    This command returns a value similar to the following text:

     dnsproxy.icb0d0thtw0ebifqt0g1jycdxd.ex.internal.cloudapp.net
    

    The icb0d0thtw0ebifqt0g1jycdxd.ex.internal.cloudapp.net text is the DNS suffix for this virtual network. Save this value, as it is used later.

  5. To configure Bind to resolve DNS names for resources within the virtual network, use the following text as the contents of the /etc/bind/named.conf.local file:

     // Replace the following with the DNS suffix for your virtual network
     zone "icb0d0thtw0ebifqt0g1jycdxd.ex.internal.cloudapp.net" {
     	type forward;
     	forwarders {168.63.129.16;}; # The Azure recursive resolver
     };
    

    [!IMPORTANT] You must replace the icb0d0thtw0ebifqt0g1jycdxd.ex.internal.cloudapp.net with the DNS suffix you retrieved earlier.

    To edit this file, use the following command:

    sudo nano /etc/bind/named.conf.local

    To save the file, use Ctrl+X, Y, and then Enter.

  6. To start Bind, use the following command:

    sudo service bind9 restart
  7. To verify that bind can resolve the names of resources in your on-premises network, use the following commands:

    sudo apt install dnsutils
    nslookup dns.mynetwork.net 10.0.0.4

    [!IMPORTANT] Replace dns.mynetwork.net with the fully qualified domain name (FQDN) of a resource in your on-premises network.

    Replace 10.0.0.4 with the internal IP address of your custom DNS server in the virtual network.

    The response appears similar to the following text:

     Server:         10.0.0.4
     Address:        10.0.0.4#53
    
     Non-authoritative answer:
     Name:   dns.mynetwork.net
     Address: 192.168.0.4
    

Configure the virtual network to use the custom DNS server

To configure the virtual network to use the custom DNS server instead of the Azure recursive resolver, use the following steps:

  1. In the Azure portal, select the virtual network, and then select DNS Servers.

  2. Select Custom, and enter the internal IP address of the custom DNS server. Finally, select Save.

    Set the custom DNS server for the network

Configure the on-premises DNS server

In the previous section, you configured the custom DNS server to forward requests to the on-premises DNS server. Next, you must configure the on-premises DNS server to forward requests to the custom DNS server.

For specific steps on how to configure your DNS server, consult the documentation for your DNS server software. Look for the steps on how to configure a conditional forwarder.

A conditional forward only forwards requests for a specific DNS suffix. In this case, you must configure a forwarder for the DNS suffix of the virtual network. Requests for this suffix should be forwarded to the IP address of the custom DNS server.

The following text is an example of a conditional forwarder configuration for the Bind DNS software:

zone "icb0d0thtw0ebifqt0g1jycdxd.ex.internal.cloudapp.net" {
	type forward;
	forwarders {10.0.0.4;}; # The custom DNS server's internal IP address
};

For information on using DNS on Windows Server 2016, see the Add-DnsServerConditionalForwarderZone documentation...

Once you have configured the on-premises DNS server, you can use nslookup from the on-premises network to verify that you can resolve names in the virtual network. The following example

nslookup dnsproxy.icb0d0thtw0ebifqt0g1jycdxd.ex.internal.cloudapp.net 196.168.0.4

This example uses the on-premises DNS server at 196.168.0.4 to resolve the name of the custom DNS server. Replace the IP address with the one for the on-premises DNS server. Replace the dnsproxy address with the fully qualified domain name of the custom DNS server.

Optional: Control network traffic

You can use network security groups (NSG) or user-defined routes (UDR) to control network traffic. NSGs allow you to filter inbound and outbound traffic, and allow or deny the traffic. UDRs allow you to control how traffic flows between resources in the virtual network, the internet, and the on-premises network.

Warning

HDInsight requires inbound access from specific IP addresses in the Azure cloud, and unrestricted outbound access. When using NSGs or UDRs to control traffic, you must perform the following steps:

  1. Find the IP addresses for the location that contains your virtual network. For a list of required IPs by location, see Required IP addresses.

  2. For the IP addresses identified in step 1, allow inbound traffic from that IP addresses.

    • If you are using NSG: Allow inbound traffic on port 443 for the IP addresses.
    • If you are using UDR: Set the Next Hop type of the route to Internet for the IP addresses.

For an example of using Azure PowerShell or the Azure CLI to create NSGs, see the Extend HDInsight with Azure Virtual Networks document.

Create the HDInsight cluster

Warning

You must configure the custom DNS server before installing HDInsight in the virtual network.

Use the steps in the Create an HDInsight cluster using the Azure portal document to create an HDInsight cluster.

Warning

  • During cluster creation, you must choose the location that contains your virtual network.

  • In the Advanced settings part of configuration, you must select the virtual network and subnet that you created earlier.

Connecting to HDInsight

Most documentation on HDInsight assumes that you have access to the cluster over the internet. For example, that you can connect to the cluster at https://CLUSTERNAME.azurehdinsight.net. This address uses the public gateway, which is not available if you have used NSGs or UDRs to restrict access from the internet.

Some documentation also references headnodehost when connecting to the cluster from an SSH session. This address is only available from nodes within a cluster, and is not usable on clients connected over the virtual network.

To directly connect to HDInsight through the virtual network, use the following steps:

  1. To discover the internal fully qualified domain names of the HDInsight cluster nodes, use one of the following methods:

    $resourceGroupName = "The resource group that contains the virtual network used with HDInsight"
    
    $clusterNICs = Get-AzureRmNetworkInterface -ResourceGroupName $resourceGroupName | where-object {$_.Name -like "*node*"}
    
    $nodes = @()
    foreach($nic in $clusterNICs) {
    	$node = new-object System.Object
    	$node | add-member -MemberType NoteProperty -name "Type" -value $nic.Name.Split('-')[1]
    	$node | add-member -MemberType NoteProperty -name "InternalIP" -value $nic.IpConfigurations.PrivateIpAddress
    	$node | add-member -MemberType NoteProperty -name "InternalFQDN" -value $nic.DnsSettings.InternalFqdn
    	$nodes += $node
    }
    $nodes | sort-object Type
    az network nic list --resource-group <resourcegroupname> --output table --query "[?contains(name,'node')].{NICname:name,InternalIP:ipConfigurations[0].privateIpAddress,InternalFQDN:dnsSettings.internalFqdn}"
    
  2. To determine the port that a service is available on, see the Ports used by Apache Hadoop services on HDInsight document.

    [!IMPORTANT] Some services hosted on the head nodes are only active on one node at a time. If you try accessing a service on one head node and it fails, switch to the other head node.

    For example, Apache Ambari is only active on one head node at a time. If you try accessing Ambari on one head node and it returns a 404 error, then it is running on the other head node.

Next steps