Skip to content

Commit

Permalink
hadoop streaming - no input required, dummy output
Browse files Browse the repository at this point in the history
  • Loading branch information
spelluru committed Nov 9, 2015
1 parent 6ee5125 commit 4e6bd38
Showing 1 changed file with 4 additions and 4 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,6 @@ The HDInsight Streaming Activity in a Data Factory [pipeline](data-factory-creat
"name": "RunHadoopStreamingJob",
"description": "Run a Hadoop streaming job",
"type": "HDInsightStreaming",
"getDebugInfo": "Failure",
"inputs": [ ],
"outputs": [ {"name": "OutputTable"} ],
"linkedServiceName": "HDInsightLinkedService",
Expand Down Expand Up @@ -70,11 +69,12 @@ Note the following:
2. Set the type of the activity to **HDInsightStreaming**.
3. For the **mapper** property, specify the name of mapper executable. In the above example, cat.exe is the mapper executable.
4. For the **reducer** property , specify the name of reducer executable. In the above example, wc.exe is the reducer executable.
5. For the **input** property, specify the input file (including the location) for the mapper. In the example: "wasb://adfsample@<account name>.blob.core.windows.net/example/data/gutenberg/davinci.txt": adfsample is the blob container, example/data/Gutenberg is the folder and davinci.txt is the blob.
6. For the **output** property, specify the output file (including the location) for the reducer. The output of the Hadoop Streaming job will be written to the location specified for this property.
5. For the **input** type property, specify the input file (including the location) for the mapper. In the example: "wasb://adfsample@<account name>.blob.core.windows.net/example/data/gutenberg/davinci.txt": adfsample is the blob container, example/data/Gutenberg is the folder and davinci.txt is the blob.
6. For the **output** type property, specify the output file (including the location) for the reducer. The output of the Hadoop Streaming job will be written to the location specified for this property.
7. In the **filePaths** section, specify the paths for the mapper and reducer executables. In the example: "adfsample/example/apps/wc.exe", adfsample is the blob container, example/apps is the folder, and wc.exe is the executable.
8. For the **fileLinkedService** property, specify the Azure Storage linked service that represents the Azure storage that contains the files specified in the filePaths section.
9. For the **arguments** property, specify the arguments for the streaming job.
10. The **getDebugInfo** property is an optional element. When it is set to Failure, the logs are downloaded only on failure. When it is set to All, logs are always downloaded irrespective of the execution status.

> [AZURE.NOTE] As shown in the example, you will need to specify an output dataset for the Hadoop Streaming Activity for the **outputs** property. This is just a dummy dataset that is required to drive the pipeline schedule. You do not need to specify any input dataset for the activity for the **inputs** property.

0 comments on commit 4e6bd38

Please sign in to comment.