title	description	services	author	ms.reviewer	ms.service	ms.custom	ms.topic	ms.date	ms.author
MapReduce and SSH connection with Apache Hadoop in HDInsight - Azure	Learn how to use SSH to run MapReduce jobs using Apache Hadoop on HDInsight.	hdinsight	hrasheed-msft	jasonh	hdinsight	hdinsightactive	conceptual	04/10/2018	hrasheed

Use MapReduce with Apache Hadoop on HDInsight with SSH

[!INCLUDE mapreduce-selector]

Learn how to submit MapReduce jobs from a Secure Shell (SSH) connection to HDInsight.

Note

If you are already familiar with using Linux-based Apache Hadoop servers, but you are new to HDInsight, see Linux-based HDInsight tips.

Prerequisites

A Linux-based HDInsight (Hadoop on HDInsight) cluster

[!IMPORTANT] Linux is the only operating system used on HDInsight version 3.4 or greater. For more information, see HDInsight retirement on Windows.
An SSH client. For more information, see Use SSH with HDInsight

Connect with SSH

Connect to the cluster using SSH. For example, the following command connects to a cluster named myhdinsight as the sshuser account:

ssh sshuser@myhdinsight-ssh.azurehdinsight.net

If you use a certificate key for SSH authentication, you may need to specify the location of the private key on your client system, for example:

ssh -i ~/mykey.key sshuser@myhdinsight-ssh.azurehdinsight.net

If you use a password for SSH authentication, you need to provide the password when prompted.

For more information on using SSH with HDInsight, see Use SSH with HDInsight.

Use Hadoop commands

After you are connected to the HDInsight cluster, use the following command to start a MapReduce job:
```
yarn jar /usr/hdp/current/hadoop-mapreduce-client/hadoop-mapreduce-examples.jar wordcount /example/data/gutenberg/davinci.txt /example/data/WordCountOutput
```
This command starts the wordcount class, which is contained in the hadoop-mapreduce-examples.jar file. It uses the /example/data/gutenberg/davinci.txt document as input, and output is stored at /example/data/WordCountOutput.

[!NOTE] For more information about this MapReduce job and the example data, see Use MapReduce in Hadoop on HDInsight.
The job emits details as it processes, and it returns information similar to the following text when the job completes:
```
 File Input Format Counters
 Bytes Read=1395666
 File Output Format Counters
 Bytes Written=337623
```
When the job completes, use the following command to list the output files:
```
hdfs dfs -ls /example/data/WordCountOutput
```
This command display two files, _SUCCESS and part-r-00000. The part-r-00000 file contains the output for this job.

[!NOTE] Some MapReduce jobs may split the results across multiple part-r-##### files. If so, use the ##### suffix to indicate the order of the files.
To view the output, use the following command:
```
hdfs dfs -cat /example/data/WordCountOutput/part-r-00000
```
This command displays a list of the words that are contained in the wasb://example/data/gutenberg/davinci.txt file and the number of times each word occurred. The following text is an example of the data that is contained in the file:
```
 wreathed        3
 wreathing       1
 wreaths         1
 wrecked         3
 wrenching       1
 wretched        6
 wriggling       1
```

Summary

As you can see, Hadoop commands provide an easy way to run MapReduce jobs in an HDInsight cluster and then view the job output.

Next steps

For general information about MapReduce jobs in HDInsight:

Use MapReduce on HDInsight Hadoop

For information about other ways you can work with Hadoop on HDInsight:

Use Hive with Hadoop on HDInsight
Use Pig with Hadoop on HDInsight

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

apache-hadoop-use-mapreduce-ssh.md

apache-hadoop-use-mapreduce-ssh.md

Use MapReduce with Apache Hadoop on HDInsight with SSH

Prerequisites

Connect with SSH

Use Hadoop commands

Summary

Next steps

Files

apache-hadoop-use-mapreduce-ssh.md

Latest commit

History

apache-hadoop-use-mapreduce-ssh.md

File metadata and controls

Use MapReduce with Apache Hadoop on HDInsight with SSH

Prerequisites

Connect with SSH

Use Hadoop commands

Summary

Next steps