title | description | services | documentationcenter | author | manager | ms.reviewer | ms.service | ms.workload | ms.tgt_pltfrm | ms.devlang | ms.topic | ms.date | ms.author |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Copy data to or from Azure SQL Database by using Data Factory | Microsoft Docs |
Learn how to copy data from supported source data stores to Azure SQL Database or from SQL Database to supported sink data stores by using Data Factory. |
data-factory |
linda33wj |
craigg |
douglasl |
data-factory |
data-services |
na |
na |
conceptual |
11/08/2018 |
jingwang |
[!div class="op_single_selector" title1="Select the version of Data Factory service you use:"]
This article explains how to use Copy Activity in Azure Data Factory to copy data from or to Azure SQL Database. It builds on the Copy Activity overview article, which presents a general overview of Copy Activity.
You can copy data from or to Azure SQL Database to any supported sink data store. And you can copy data from any supported source data store to Azure SQL Database. For a list of data stores that are supported as sources or sinks by Copy Activity, see the Supported data stores and formats table.
Specifically, this Azure SQL Database connector supports these functions:
- Copy data by using SQL authentication and Azure Active Directory (Azure AD) Application token authentication with a service principal or managed identities for Azure resources.
- As a source, retrieve data by using a SQL query or stored procedure.
- As a sink, append data to a destination table or invoke a stored procedure with custom logic during the copy.
Important
If you copy data by using Azure Data Factory Integration Runtime, configure an Azure SQL server firewall so that Azure Services can access the server. If you copy data by using a self-hosted integration runtime, configure the Azure SQL server firewall to allow the appropriate IP range. This range includes the machine's IP that is used to connect to Azure SQL Database.
[!INCLUDE data-factory-v2-connector-get-started]
The following sections provide details about properties that are used to define Data Factory entities specific to an Azure SQL Database connector.
These properties are supported for an Azure SQL Database linked service:
Property | Description | Required |
---|---|---|
type | The type property must be set to AzureSqlDatabase. | Yes |
connectionString | Specify information needed to connect to the Azure SQL Database instance for the connectionString property. Mark this field as a SecureString to store it securely in Data Factory, or reference a secret stored in Azure Key Vault. | Yes |
servicePrincipalId | Specify the application's client ID. | Yes, when you use Azure AD authentication with a service principal. |
servicePrincipalKey | Specify the application's key. Mark this field as a SecureString to store it securely in Data Factory, or reference a secret stored in Azure Key Vault. | Yes, when you use Azure AD authentication with a service principal. |
tenant | Specify the tenant information (domain name or tenant ID) under which your application resides. Retrieve it by hovering the mouse in the top-right corner of the Azure portal. | Yes, when you use Azure AD authentication with a service principal. |
connectVia | The integration runtime to be used to connect to the data store. You can use Azure Integration Runtime or a self-hosted integration runtime if your data store is located in a private network. If not specified, it uses the default Azure Integration Runtime. | No |
For different authentication types, refer to the following sections on prerequisites and JSON samples, respectively:
- SQL authentication
- Azure AD application token authentication: Service principal
- Azure AD application token authentication: Managed identities for Azure resources
Tip
If you hit error with error code as "UserErrorFailedToConnectToSqlServer" and message like "The session limit for the database is XXX and has been reached.", add Pooling=false
to your connection string and try again.
{
"name": "AzureSqlDbLinkedService",
"properties": {
"type": "AzureSqlDatabase",
"typeProperties": {
"connectionString": {
"type": "SecureString",
"value": "Server=tcp:<servername>.database.windows.net,1433;Database=<databasename>;User ID=<username>@<servername>;Password=<password>;Trusted_Connection=False;Encrypt=True;Connection Timeout=30"
}
},
"connectVia": {
"referenceName": "<name of Integration Runtime>",
"type": "IntegrationRuntimeReference"
}
}
}
To use a service principal-based Azure AD application token authentication, follow these steps:
-
Create an Azure Active Directory application from the Azure portal. Make note of the application name and the following values that define the linked service:
- Application ID
- Application key
- Tenant ID
-
Provision an Azure Active Directory administrator for your Azure SQL server on the Azure portal if you haven't already done so. The Azure AD administrator must be an Azure AD user or Azure AD group, but it can't be a service principal. This step is done so that, in the next step, you can use an Azure AD identity to create a contained database user for the service principal.
-
Create contained database users for the service principal. Connect to the database from or to which you want to copy data by using tools like SSMS, with an Azure AD identity that has at least ALTER ANY USER permission. Run the following T-SQL:
CREATE USER [your application name] FROM EXTERNAL PROVIDER;
-
Grant the service principal needed permissions as you normally do for SQL users or others. Run the following code:
EXEC sp_addrolemember [role name], [your application name];
-
Configure an Azure SQL Database linked service in Azure Data Factory.
{
"name": "AzureSqlDbLinkedService",
"properties": {
"type": "AzureSqlDatabase",
"typeProperties": {
"connectionString": {
"type": "SecureString",
"value": "Server=tcp:<servername>.database.windows.net,1433;Database=<databasename>;Connection Timeout=30"
},
"servicePrincipalId": "<service principal id>",
"servicePrincipalKey": {
"type": "SecureString",
"value": "<service principal key>"
},
"tenant": "<tenant info, e.g. microsoft.onmicrosoft.com>"
},
"connectVia": {
"referenceName": "<name of Integration Runtime>",
"type": "IntegrationRuntimeReference"
}
}
}
A data factory can be associated with a managed identity for Azure resources that represents the specific data factory. You can use this service identity for Azure SQL Database authentication. The designated factory can access and copy data from or to your database by using this identity.
To use MSI-based Azure AD application token authentication, follow these steps:
-
Create a group in Azure AD. Make the factory MSI a member of the group.
-
Find the data factory service identity from the Azure portal. Go to your data factory's Properties. Copy the SERVICE IDENTITY ID.
-
Install the Azure AD PowerShell module. Sign in by using the
Connect-AzureAD
command. Run the following commands to create a group and add the data factory MSI as a member.
$Group = New-AzureADGroup -DisplayName "<your group name>" -MailEnabled $false -SecurityEnabled $true -MailNickName "NotSet" Add-AzureAdGroupMember -ObjectId $Group.ObjectId -RefObjectId "<your data factory service identity ID>"
-
-
Provision an Azure Active Directory administrator for your Azure SQL server on the Azure portal if you haven't already done so. The Azure AD administrator can be an Azure AD user or Azure AD group. If you grant the group with MSI an admin role, skip steps 3 and 4. The administrator will have full access to the database.
-
Create contained database users for the Azure AD group. Connect to the database from or to which you want to copy data by using tools like SSMS, with an Azure AD identity that has at least ALTER ANY USER permission. Run the following T-SQL:
CREATE USER [your AAD group name] FROM EXTERNAL PROVIDER;
-
Grant the Azure AD group needed permissions as you normally do for SQL users and others. For example, run the following code:
EXEC sp_addrolemember [role name], [your AAD group name];
-
Configure an Azure SQL Database linked service in Azure Data Factory.
{
"name": "AzureSqlDbLinkedService",
"properties": {
"type": "AzureSqlDatabase",
"typeProperties": {
"connectionString": {
"type": "SecureString",
"value": "Server=tcp:<servername>.database.windows.net,1433;Database=<databasename>;Connection Timeout=30"
}
},
"connectVia": {
"referenceName": "<name of Integration Runtime>",
"type": "IntegrationRuntimeReference"
}
}
}
For a full list of sections and properties available for defining datasets, see the Datasets article. This section provides a list of properties supported by the Azure SQL Database dataset.
To copy data from or to Azure SQL Database, set the type property of the dataset to AzureSqlTable. The following properties are supported:
Property | Description | Required |
---|---|---|
type | The type property of the dataset must be set to AzureSqlTable. | Yes |
tableName | The name of the table or view in the Azure SQL Database instance that the linked service refers to. | No for source, Yes for sink |
{
"name": "AzureSQLDbDataset",
"properties":
{
"type": "AzureSqlTable",
"linkedServiceName": {
"referenceName": "<Azure SQL Database linked service name>",
"type": "LinkedServiceReference"
},
"typeProperties": {
"tableName": "MyTable"
}
}
}
For a full list of sections and properties available for defining activities, see the Pipelines article. This section provides a list of properties supported by the Azure SQL Database source and sink.
To copy data from Azure SQL Database, set the type property in the Copy Activity source to SqlSource. The following properties are supported in the Copy Activity source section:
Property | Description | Required |
---|---|---|
type | The type property of the Copy Activity source must be set to SqlSource. | Yes |
sqlReaderQuery | Use the custom SQL query to read data. Example: select * from MyTable . |
No |
sqlReaderStoredProcedureName | The name of the stored procedure that reads data from the source table. The last SQL statement must be a SELECT statement in the stored procedure. | No |
storedProcedureParameters | Parameters for the stored procedure. Allowed values are name or value pairs. Names and casing of parameters must match the names and casing of the stored procedure parameters. |
No |
- If the sqlReaderQuery is specified for the SqlSource, Copy Activity runs this query against the Azure SQL Database source to get the data. Or you can specify a stored procedure. Specify sqlReaderStoredProcedureName and storedProcedureParameters if the stored procedure takes parameters.
- If you don't specify either sqlReaderQuery or sqlReaderStoredProcedureName, the columns defined in the structure section of the dataset JSON are used to construct a query.
select column1, column2 from mytable
runs against Azure SQL Database. If the dataset definition doesn't have the structure, all columns are selected from the table.
"activities":[
{
"name": "CopyFromAzureSQLDatabase",
"type": "Copy",
"inputs": [
{
"referenceName": "<Azure SQL Database input dataset name>",
"type": "DatasetReference"
}
],
"outputs": [
{
"referenceName": "<output dataset name>",
"type": "DatasetReference"
}
],
"typeProperties": {
"source": {
"type": "SqlSource",
"sqlReaderQuery": "SELECT * FROM MyTable"
},
"sink": {
"type": "<sink type>"
}
}
}
]
"activities":[
{
"name": "CopyFromAzureSQLDatabase",
"type": "Copy",
"inputs": [
{
"referenceName": "<Azure SQL Database input dataset name>",
"type": "DatasetReference"
}
],
"outputs": [
{
"referenceName": "<output dataset name>",
"type": "DatasetReference"
}
],
"typeProperties": {
"source": {
"type": "SqlSource",
"sqlReaderStoredProcedureName": "CopyTestSrcStoredProcedureWithParameters",
"storedProcedureParameters": {
"stringData": { "value": "str3" },
"identifier": { "value": "$$Text.Format('{0:yyyy}', <datetime parameter>)", "type": "Int"}
}
},
"sink": {
"type": "<sink type>"
}
}
}
]
CREATE PROCEDURE CopyTestSrcStoredProcedureWithParameters
(
@stringData varchar(20),
@identifier int
)
AS
SET NOCOUNT ON;
BEGIN
select *
from dbo.UnitTestSrcTable
where dbo.UnitTestSrcTable.stringData != stringData
and dbo.UnitTestSrcTable.identifier != identifier
END
GO
To copy data to Azure SQL Database, set the type property in the Copy Activity sink to SqlSink. The following properties are supported in the Copy Activity sink section:
Property | Description | Required |
---|---|---|
type | The type property of the Copy Activity sink must be set to SqlSink. | Yes |
writeBatchSize | Inserts data into the SQL table when the buffer size reaches writeBatchSize. The allowed value is integer (number of rows). |
No. The default is 10000. |
writeBatchTimeout | The wait time for the batch insert operation to finish before it times out. The allowed value is timespan. Example: “00:30:00” (30 minutes). |
No |
preCopyScript | Specify a SQL query for Copy Activity to run before writing data into Azure SQL Database. It's only invoked once per copy run. Use this property to clean up the preloaded data. | No |
sqlWriterStoredProcedureName | The name of the stored procedure that defines how to apply source data into a target table. An example is to do upserts or transform by using your own business logic. This stored procedure is invoked per batch. For operations that only run once and have nothing to do with source data, use the preCopyScript property. Example operations are delete and truncate. |
No |
storedProcedureParameters | Parameters for the stored procedure. Allowed values are name and value pairs. Names and casing of parameters must match the names and casing of the stored procedure parameters. |
No |
sqlWriterTableType | Specify a table type name to be used in the stored procedure. Copy Activity makes the data being moved available in a temporary table with this table type. Stored procedure code can then merge the data being copied with existing data. | No |
Tip
When you copy data to Azure SQL Database, Copy Activity appends data to the sink table by default. To do an upsert or additional business logic, use the stored procedure in SqlSink. Learn more details from Invoking stored procedure from SQL Sink.
"activities":[
{
"name": "CopyToAzureSQLDatabase",
"type": "Copy",
"inputs": [
{
"referenceName": "<input dataset name>",
"type": "DatasetReference"
}
],
"outputs": [
{
"referenceName": "<Azure SQL Database output dataset name>",
"type": "DatasetReference"
}
],
"typeProperties": {
"source": {
"type": "<source type>"
},
"sink": {
"type": "SqlSink",
"writeBatchSize": 100000
}
}
}
]
Learn more details from Invoking stored procedure from SQL Sink.
"activities":[
{
"name": "CopyToAzureSQLDatabase",
"type": "Copy",
"inputs": [
{
"referenceName": "<input dataset name>",
"type": "DatasetReference"
}
],
"outputs": [
{
"referenceName": "<Azure SQL Database output dataset name>",
"type": "DatasetReference"
}
],
"typeProperties": {
"source": {
"type": "<source type>"
},
"sink": {
"type": "SqlSink",
"sqlWriterStoredProcedureName": "CopyTestStoredProcedureWithParameters",
"sqlWriterTableType": "CopyTestTableType",
"storedProcedureParameters": {
"identifier": { "value": "1", "type": "Int" },
"stringData": { "value": "str1" }
}
}
}
}
]
This section shows you how to copy data from a source table without an identity column to a destination table with an identity column.
create table dbo.SourceTbl
(
name varchar(100),
age int
)
create table dbo.TargetTbl
(
identifier int identity(1,1),
name varchar(100),
age int
)
Note
The target table has an identity column.
{
"name": "SampleSource",
"properties": {
"type": " AzureSqlTable",
"linkedServiceName": {
"referenceName": "TestIdentitySQL",
"type": "LinkedServiceReference"
},
"typeProperties": {
"tableName": "SourceTbl"
}
}
}
{
"name": "SampleTarget",
"properties": {
"structure": [
{ "name": "name" },
{ "name": "age" }
],
"type": "AzureSqlTable",
"linkedServiceName": {
"referenceName": "TestIdentitySQL",
"type": "LinkedServiceReference"
},
"typeProperties": {
"tableName": "TargetTbl"
}
}
}
Note
Your source and target table have different schema.
The target has an additional column with an identity. In this scenario, you must specify the structure property in the target dataset definition, which doesn’t include the identity column.
When you copy data into Azure SQL Database, you can also configure and invoke a user-specified stored procedure with additional parameters.
You can use a stored procedure when built-in copy mechanisms don't serve the purpose. They're typically used when an upsert, insert plus update, or extra processing must be done before the final insertion of source data into the destination table. Some extra processing examples are merge columns, look up additional values, and insertion into more than one table.
The following sample shows how to use a stored procedure to do an upsert into a table in Azure SQL Database. Assume that input data and the sink Marketing table each have three columns: ProfileID, State, and Category. Do the upsert based on the ProfileID column, and only apply it for a specific category.
{
"name": "AzureSQLDbDataset",
"properties":
{
"type": "AzureSqlTable",
"linkedServiceName": {
"referenceName": "<Azure SQL Database linked service name>",
"type": "LinkedServiceReference"
},
"typeProperties": {
"tableName": "Marketing"
}
}
}
Define the SqlSink section in Copy Activity:
"sink": {
"type": "SqlSink",
"SqlWriterTableType": "MarketingType",
"SqlWriterStoredProcedureName": "spOverwriteMarketing",
"storedProcedureParameters": {
"category": {
"value": "ProductA"
}
}
}
In your database, define the stored procedure with the same name as the SqlWriterStoredProcedureName. It handles input data from your specified source and merges into the output table. The parameter name of the table type in the stored procedure should be the same as the tableName defined in the dataset.
CREATE PROCEDURE spOverwriteMarketing @Marketing [dbo].[MarketingType] READONLY, @category varchar(256)
AS
BEGIN
MERGE [dbo].[Marketing] AS target
USING @Marketing AS source
ON (target.ProfileID = source.ProfileID and target.Category = @category)
WHEN MATCHED THEN
UPDATE SET State = source.State
WHEN NOT MATCHED THEN
INSERT (ProfileID, State, Category)
VALUES (source.ProfileID, source.State, source.Category)
END
In your database, define the table type with the same name as the sqlWriterTableType. The schema of the table type should be same as the schema returned by your input data.
CREATE TYPE [dbo].[MarketingType] AS TABLE(
[ProfileID] [varchar](256) NOT NULL,
[State] [varchar](256) NOT NULL,
[Category] [varchar](256) NOT NULL,
)
The stored procedure feature takes advantage of Table-Valued Parameters.
Note
If you write to Money/Smallmoney data type by invoking Stored Procedure, values may be rounded. Specify the corresponding data type in TVP as Decimal instead of Money/Smallmoney to mitigate.
When you copy data from or to Azure SQL Database, the following mappings are used from Azure SQL Database data types to Azure Data Factory interim data types. See Schema and data type mappings to learn how Copy Activity maps the source schema and data type to the sink.
Azure SQL Database data type | Data Factory interim data type |
---|---|
bigint | Int64 |
binary | Byte[] |
bit | Boolean |
char | String, Char[] |
date | DateTime |
Datetime | DateTime |
datetime2 | DateTime |
Datetimeoffset | DateTimeOffset |
Decimal | Decimal |
FILESTREAM attribute (varbinary(max)) | Byte[] |
Float | Double |
image | Byte[] |
int | Int32 |
money | Decimal |
nchar | String, Char[] |
ntext | String, Char[] |
numeric | Decimal |
nvarchar | String, Char[] |
real | Single |
rowversion | Byte[] |
smalldatetime | DateTime |
smallint | Int16 |
smallmoney | Decimal |
sql_variant | Object * |
text | String, Char[] |
time | TimeSpan |
timestamp | Byte[] |
tinyint | Byte |
uniqueidentifier | Guid |
varbinary | Byte[] |
varchar | String, Char[] |
xml | Xml |
For a list of data stores supported as sources and sinks by Copy Activity in Azure Data Factory, see Supported data stores and formats.