Skip to content

Files

Latest commit

 

History

History
120 lines (89 loc) · 6.26 KB

apache-hadoop-use-sqoop-dotnet-sdk.md

File metadata and controls

120 lines (89 loc) · 6.26 KB
title description keywords ms.reviewer services author ms.service ms.custom ms.topic ms.date ms.author
Run Apache Sqoop jobs by using .NET and HDInsight - Azure
Learn how to use the HDInsight .NET SDK to run Apache Sqoop import and export between an Apache Hadoop cluster and an Azure SQL database.
sqoop job
jasonh
hdinsight
hrasheed-msft
hdinsight
hdinsightactive,hdiseo17may2017
conceptual
05/16/2018
hrasheed

Run Apache Sqoop jobs by using .NET SDK for Apache Hadoop in HDInsight

[!INCLUDE sqoop-selector]

Learn how to use the Azure HDInsight .NET SDK to run Apache Sqoop jobs in HDInsight to import and export between an HDInsight cluster and an Azure SQL database or SQL Server database.

Note

Although you can use the procedures in this article with either a Windows-based or Linux-based HDInsight cluster, they work only from a Windows client. To choose other methods, use the tab selector at the top of this article.

Prerequisites

Before you begin this tutorial, you must have the following item:

Use Sqoop on HDInsight clusters with the .NET SDK

The HDInsight .NET SDK provides .NET client libraries, so that it's easier to work with HDInsight clusters from .NET. In this section, you create a C# console application to export the hivesampletable to the Azure SQL Database table that you created earlier in this tutorial.

Submit a Sqoop job

  1. Create a C# console application in Visual Studio.

  2. From the Visual Studio Package Manager console, import the package by running the following NuGet command:

     Install-Package Microsoft.Azure.Management.HDInsight.Job
    
  3. Use the following code in the Program.cs file:

     using System.Collections.Generic;
     using Microsoft.Azure.Management.HDInsight.Job;
     using Microsoft.Azure.Management.HDInsight.Job.Models;
     using Hyak.Common;
    
     namespace SubmitHDInsightJobDotNet
     {
         class Program
         {
             private static HDInsightJobManagementClient _hdiJobManagementClient;
    
             private const string ExistingClusterName = "<Your HDInsight Cluster Name>";
             private const string ExistingClusterUri = ExistingClusterName + ".azurehdinsight.net";
             private const string ExistingClusterUsername = "<Cluster Username>";
             private const string ExistingClusterPassword = "<Cluster User Password>";
    
             static void Main(string[] args)
             {
                 System.Console.WriteLine("The application is running ...");
    
                 var clusterCredentials = new BasicAuthenticationCloudCredentials { Username = ExistingClusterUsername, Password = ExistingClusterPassword };
                 _hdiJobManagementClient = new HDInsightJobManagementClient(ExistingClusterUri, clusterCredentials);
    
                 SubmitSqoopJob();
    
                 System.Console.WriteLine("Press ENTER to continue ...");
                 System.Console.ReadLine();
             }
    
             private static void SubmitSqoopJob()
             {
                 var sqlDatabaseServerName = "<SQLDatabaseServerName>";
                 var sqlDatabaseLogin = "<SQLDatabaseLogin>";
                 var sqlDatabaseLoginPassword = "<SQLDatabaseLoginPassword>";
                 var sqlDatabaseDatabaseName = "<DatabaseName>";
    
                 var tableName = "<TableName>";
                 var exportDir = "/tutorials/usesqoop/data";
    
                 // Connection string for using Azure SQL Database.
                 // Comment if using SQL Server
                 var connectionString = "jdbc:sqlserver://" + sqlDatabaseServerName + ".database.windows.net;user=" + sqlDatabaseLogin + "@" + sqlDatabaseServerName + ";password=" + sqlDatabaseLoginPassword + ";database=" + sqlDatabaseDatabaseName;
                 // Connection string for using SQL Server.
                 // Uncomment if using SQL Server
                 //var connectionString = "jdbc:sqlserver://" + sqlDatabaseServerName + ";user=" + sqlDatabaseLogin + ";password=" + sqlDatabaseLoginPassword + ";database=" + sqlDatabaseDatabaseName;
    
                 var parameters = new SqoopJobSubmissionParameters
                 {
                     Files = new List<string> { "/user/oozie/share/lib/sqoop/sqljdbc41.jar" }, // This line is required for Linux-based cluster.
                     Command = "export --connect " + connectionString + " --table " + tableName + "_mobile --export-dir " + exportDir + "_mobile --fields-terminated-by \\t -m 1"
                 };
    
                 System.Console.WriteLine("Submitting the Sqoop job to the cluster...");
                 var response = _hdiJobManagementClient.JobManagement.SubmitSqoopJob(parameters);
                 System.Console.WriteLine("Validating that the response is as expected...");
                 System.Console.WriteLine("Response status code is " + response.StatusCode);
                 System.Console.WriteLine("Validating the response object...");
                 System.Console.WriteLine("JobId is " + response.JobSubmissionJsonResponse.Id);
             }
         }
     }
    
  4. To run the program, select the F5 key.

Limitations

Linux-based HDInsight presents the following limitations:

  • Bulk export: The Sqoop connector that's used to export data to Microsoft SQL Server or Azure SQL Database does not currently support bulk inserts.

  • Batching: By using the -batch switch when it performs inserts, Sqoop performs multiple inserts instead of batching the insert operations.

Next steps

Now you have learned how to use Sqoop. To learn more, see: