Skip to content

Latest commit

 

History

History
78 lines (69 loc) · 4.21 KB

transform-data-using-hadoop-hive.md

File metadata and controls

78 lines (69 loc) · 4.21 KB
title description services documentationcenter author manager ms.service ms.workload ms.tgt_pltfrm ms.devlang ms.topic ms.date ms.author
Transform data using Hadoop Hive activity in Azure Data Factory | Microsoft Docs
Learn how you can use the Hive Activity in an Azure data factory to run Hive queries on an on-demand/your own HDInsight cluster.
data-factory
douglaslMS
craigg
data-factory
data-services
na
na
conceptual
01/16/2018
douglasl

Transform data using Hadoop Hive activity in Azure Data Factory

[!div class="op_single_selector" title1="Select the version of Data Factory service you are using:"]

The HDInsight Hive activity in a Data Factory pipeline executes Hive queries on your own or on-demand HDInsight cluster. This article builds on the data transformation activities article, which presents a general overview of data transformation and the supported transformation activities.

If you are new to Azure Data Factory, read through Introduction to Azure Data Factory and do the Tutorial: transform data before reading this article.

Syntax

{
    "name": "Hive Activity",
    "description": "description",
    "type": "HDInsightHive",
    "linkedServiceName": {
        "referenceName": "MyHDInsightLinkedService",
        "type": "LinkedServiceReference"
    },
    "typeProperties": {
        "scriptLinkedService": {
            "referenceName": "MyAzureStorageLinkedService",
            "type": "LinkedServiceReference"
        },
        "scriptPath": "MyAzureStorage\\HiveScripts\\MyHiveSript.hql",
        "getDebugInfo": "Failure",
        "arguments": [
            "SampleHadoopJobArgument1"
        ],
        "defines": {
            "param1": "param1Value"
        }
    }   
}

Syntax details

Property Description Required
name Name of the activity Yes
description Text describing what the activity is used for No
type For Hive Activity, the activity type is HDinsightHive Yes
linkedServiceName Reference to the HDInsight cluster registered as a linked service in Data Factory. To learn about this linked service, see Compute linked services article. Yes
scriptLinkedService Reference to an Azure Storage Linked Service used to store the Hive script to be executed. If you don't specify this Linked Service, the Azure Storage Linked Service defined in the HDInsight Linked Service is used. No
scriptPath Provide the path to the script file stored in the Azure Storage referred by scriptLinkedService. The file name is case-sensitive. Yes
getDebugInfo Specifies when the log files are copied to the Azure Storage used by HDInsight cluster (or) specified by scriptLinkedService. Allowed values: None, Always, or Failure. Default value: None. No
arguments Specifies an array of arguments for a Hadoop job. The arguments are passed as command-line arguments to each task. No
defines Specify parameters as key/value pairs for referencing within the Hive script. No

Next steps

See the following articles that explain how to transform data in other ways: