Skip to content

Latest commit

 

History

History
101 lines (76 loc) · 4.82 KB

apache-hadoop-use-mapreduce-remote-desktop.md

File metadata and controls

101 lines (76 loc) · 4.82 KB
title description services documentationcenter author manager editor tags ms.assetid ms.service ms.devlang ms.topic ms.tgt_pltfrm ms.workload ms.date ms.author ROBOTS
MapReduce and Remote Desktop with Hadoop in HDInsight - Azure | Microsoft Docs
Learn how to use Remote Desktop to connect to Hadoop on HDInsight and run MapReduce jobs.
hdinsight
Blackmist
jhubbard
cgronlun
azure-portal
9d3a7b34-7def-4c2e-bb6c-52682d30dee8
hdinsight
na
article
na
big-data
01/12/2017
larryfr
NOINDEX

Use MapReduce in Hadoop on HDInsight with Remote Desktop

[!INCLUDE mapreduce-selector]

In this article, you will learn how to connect to a Hadoop on HDInsight cluster by using Remote Desktop and then run MapReduce jobs by using the Hadoop command.

Important

Remote Desktop is only available on Windows-based HDInsight clusters. Linux is the only operating system used on HDInsight version 3.4 or greater. For more information, see HDInsight retirement on Windows.

For HDInsight 3.4 or greater, see Use MapReduce with SSH for information on connecting to the HDInsight cluster and running MapReduce jobs.

Prerequisites

To complete the steps in this article, you will need the following:

  • A Windows-based HDInsight (Hadoop on HDInsight) cluster
  • A client computer running Windows 10, Windows 8, or Windows 7

Connect with Remote Desktop

Enable Remote Desktop for the HDInsight cluster, then connect to it by following the instructions at Connect to HDInsight clusters using RDP.

Use the Hadoop command

When you are connected to the desktop for the HDInsight cluster, use the following steps to run a MapReduce job by using the Hadoop command:

  1. From the HDInsight desktop, start the Hadoop Command Line. This opens a new command prompt in the c:\apps\dist\hadoop-<version number> directory.

    [!NOTE] The version number changes as Hadoop is updated. The HADOOP_HOME environment variable can be used to find the path. For example, cd %HADOOP_HOME% changes directories to the Hadoop directory, without requiring you to know the version number.

  2. To use the Hadoop command to run an example MapReduce job, use the following command:

     hadoop jar hadoop-mapreduce-examples.jar wordcount wasb:///example/data/gutenberg/davinci.txt wasb:///example/data/WordCountOutput
    

    This starts the wordcount class, which is contained in the hadoop-mapreduce-examples.jar file in the current directory. As input, it uses the wasb://example/data/gutenberg/davinci.txt document, and output is stored at: wasb:///example/data/WordCountOutput.

    [!NOTE] for more information about this MapReduce job and the example data, see Use MapReduce in HDInsight Hadoop.

  3. The job emits details as it is processed, and it returns information similar to the following when the job is complete:

     File Input Format Counters
     Bytes Read=1395666
     File Output Format Counters
     Bytes Written=337623
    
  4. When the job is complete, use the following command to list the output files stored at wasb://example/data/WordCountOutput:

     hadoop fs -ls wasb:///example/data/WordCountOutput
    

    This should display two files, _SUCCESS and part-r-00000. The part-r-00000 file contains the output for this job.

    [!NOTE] Some MapReduce jobs may split the results across multiple part-r-##### files. If so, use the ##### suffix to indicate the order of the files.

  5. To view the output, use the following command:

     hadoop fs -cat wasb:///example/data/WordCountOutput/part-r-00000
    

    This displays a list of the words that are contained in the wasb://example/data/gutenberg/davinci.txt file, along with the number of times each word occured. The following is an example of the data that will be contained in the file:

     wreathed        3
     wreathing       1
     wreaths         1
     wrecked         3
     wrenching       1
     wretched        6
     wriggling       1
    

Summary

As you can see, the Hadoop command provides an easy way to run MapReduce jobs on an HDInsight cluster and then view the job output.

Next steps

For general information about MapReduce jobs in HDInsight:

For information about other ways you can work with Hadoop on HDInsight: