The SQL Server Integration Services Feature Pack for Azure provides components to connect to Azure, transfer data between Azure and on-premises data sources, and process data stored in Azure.
Guidance on technologies used to move data to and/or from Azure Blob storage are linked here:
[AZURE.INCLUDE blob-storage-tool-selector]
Once customers have moved on-premises data into the cloud, they can access it from any Azure service to leverage the full power of the suite of Azure technologies. It may be used, for example, in Azure Machine Learning or on an HDInsight cluster.
This will typically be the first step for the SQL and HDInsight walkthroughs.
For a discussion of canonical scenarios that use SSIS to accomplish business needs common in hybrid data integration scenarios, see Doing more with SQL Server Integration Services Feature Pack for Azure blog.
[AZURE.NOTE] For a complete introduction to Azure blob storage, please refer to Azure Blob Basics and Azure Blob Service.
To perform the tasks described in this article, you must have an Azure subscription and an Azure storage account set up. You must know your Azure storage account name and account key in order to upload or download data.
- To set up an Azure subscription, see Free one-month trial.
- For instructions on creating a storage account and for getting account and key information, see About Azure storage accounts.
To use the SSIS connectors you must download:
- SQL Server 2014 or 2016 Standard (or above): Install includes includes SQL Server Integration Services.
- Microsoft SQL Server 2014 or 2016 Integration Services Feature Pack for Azure: These can be downloaded, respectively, from the SQL Server 2014 Integration Services and SQL Server 2016 Integration Services pages.
[AZURE.NOTE] SSIS is installed with SQL Server, but is not included in the Express version. For information on what applications are included in various editions of SQL Server, see SQL Server Editions
For training materials on SSIS, see Hands On Training for SSIS
For information on how to get up-and-running using SISS to build simple extraction, transformation, and load (ETL) packages, see SSIS Tutorial: Creating a Simple ETL Package.
The example described here use a publicly available dataset -- the NYC Taxi Trips dataset. The dataset consists of about 173 million taxi rides in NYC in the year 2013. There are two types of data : trip details data and fare data. As there is a file for each month, we have 24 files in all, each of which is approximately 2GB uncompressed.
To move data using the SSIS feature pack from on-premises to Azure blob storage, we use an instance of the Azure Blob Upload Task, shown below:
The parameters that the task uses are described here:
Field | Description |
---|---|
AzureStorageConnection | Specifies an existing Azure Storage Connection Manager or creates a new one that refers to an Azure storage account that points to where the blob files are hosted. |
BlobContainer | Specifies the name of the blob container that will hold the uploaded files as blobs. |
BlobDirectory | Specifies the blob directory where the uploaded file will be stored as a block blob. The blob directory is a virtual hierarchical structure. If the blob already exists, it will be replaced. |
LocalDirectory | Specifies the local directory that contains the files to be uploaded. |
FileName | Specifies a name filter to select files with the specified name pattern. For example, MySheet*.xls* includes files such as MySheet001.xls and MySheetABC.xlsx |
TimeRangeFrom/TimeRangeTo | Specifies a time range filter. Files modified after TimeRangeFrom and before TimeRangeTo will be included. |
[AZURE.NOTE] The AzureStorageConnection credentials need to be correct and the BlobContainer must exist before the transfer is attempted.
To download data from Azure blob storage to on-premise storage with SSIS, use an instance of the Azure Blob Upload Task.
##More advanced SSIS-Azure scenarios We note here that the SSIS feature pack allows for more complex flows to be handled by packaging tasks together. For example, the blob data could feed directly into an HDInsight cluster whose output could be downloaded back to a blob and then to an on-premises storage. SSIS can run Hive and Pig jobs on an HDInsight cluster using additional SSIS connectors:
- To run a Hive script on an Azure HDInsight cluster with SSIS, use Azure HDInsight Hive Task.
- To run a Pig script on an Azure HDInsight cluster with SSIS, use Azure HDInsight Pig Task.