title | description | services | documentationcenter | author | manager | ms.reviewer | ms.service | ms.workload | ms.tgt_pltfrm | ms.devlang | ms.topic | ms.date | ms.author |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Copy data to/from Azure SQL Database Managed Instance using Azure Data Factory | Microsoft Docs |
Learn about how to move data to/from Azure SQL Database Managed Instance using Azure Data Factory. |
data-factory |
linda33wj |
craigg |
douglasl |
data-factory |
data-services |
na |
na |
conceptual |
11/15/2018 |
jingwang |
This article outlines how to use the Copy Activity in Azure Data Factory to copy data from and to an Azure SQL Database Managed Instance. It builds on the copy activity overview article that presents a general overview of copy activity.
You can copy data from Azure SQL Database Managed Instance to any supported sink data store, or copy data from any supported source data store to the Managed Instance. For a list of data stores that are supported as sources/sinks by the copy activity, see the Supported data stores table.
Specifically, this Azure SQL Database Managed Instance connector supports:
- Copying data using SQL or Windows authentication.
- As source, retrieving data using SQL query or stored procedure.
- As sink, appending data to destination table or invoking a stored procedure with custom logic during copy.
To use copy data from an Azure SQL Database Managed Instance which is located in VNET, you need to set up a Self-hosted Integration Runtime in the same VNET that can access the database. See Self-hosted Integration Runtime article for details.
[!INCLUDE data-factory-v2-connector-get-started]
The following sections provide details about properties that are used to define Data Factory entities specific to Azure SQL Database Managed Instance connector.
The following properties are supported for Azure SQL Database Managed Instance linked service:
Property | Description | Required |
---|---|---|
type | The type property must be set to: SqlServer | Yes |
connectionString | Specify connectionString information needed to connect to the Managed Instance using either SQL authentication or Windows authentication. Refer to the following sample. Mark this field as a SecureString to store it securely in Data Factory, or reference a secret stored in Azure Key Vault. | Yes |
userName | Specify user name if you are using Windows Authentication. Example: domainname\username. | No |
password | Specify password for the user account you specified for the userName. Mark this field as a SecureString to store it securely in Data Factory, or reference a secret stored in Azure Key Vault. | No |
connectVia | The Integration Runtime to be used to connect to the data store. Provision the Self-hosted Integration Runtime in the same VNET as your Managed Instance. | Yes |
Tip
If you hit error with error code as "UserErrorFailedToConnectToSqlServer" and message like "The session limit for the database is XXX and has been reached.", add Pooling=false
to your connection string and try again.
Example 1: Using SQL authentication
{
"name": "SqlServerLinkedService",
"properties": {
"type": "SqlServer",
"typeProperties": {
"connectionString": {
"type": "SecureString",
"value": "Data Source=<servername>\\<instance name if using named instance>;Initial Catalog=<databasename>;Integrated Security=False;User ID=<username>;Password=<password>;"
}
},
"connectVia": {
"referenceName": "<name of Integration Runtime>",
"type": "IntegrationRuntimeReference"
}
}
}
Example 2: Using Windows authentication
{
"name": "SqlServerLinkedService",
"properties": {
"type": "SqlServer",
"typeProperties": {
"connectionString": {
"type": "SecureString",
"value": "Data Source=<servername>\\<instance name if using named instance>;Initial Catalog=<databasename>;Integrated Security=True;"
},
"userName": "<domain\\username>",
"password": {
"type": "SecureString",
"value": "<password>"
}
},
"connectVia": {
"referenceName": "<name of Integration Runtime>",
"type": "IntegrationRuntimeReference"
}
}
}
For a full list of sections and properties available for defining datasets, see the datasets article. This section provides a list of properties supported by Azure SQL Database Managed Instance dataset.
To copy data from/to Azure SQL Database Managed Instance, set the type property of the dataset to SqlServerTable. The following properties are supported:
Property | Description | Required |
---|---|---|
type | The type property of the dataset must be set to: SqlServerTable | Yes |
tableName | Name of the table or view in the database instance that linked service refers to. | No for source, Yes for sink |
Example
{
"name": "SQLServerDataset",
"properties":
{
"type": "SqlServerTable",
"linkedServiceName": {
"referenceName": "<Managed Instance linked service name>",
"type": "LinkedServiceReference"
},
"typeProperties": {
"tableName": "MyTable"
}
}
}
For a full list of sections and properties available for defining activities, see the Pipelines article. This section provides a list of properties supported by Azure SQL Database Managed Instance source and sink.
To copy data from Azure SQL Database Managed Instance, set the source type in the copy activity to SqlSource. The following properties are supported in the copy activity source section:
Property | Description | Required |
---|---|---|
type | The type property of the copy activity source must be set to: SqlSource | Yes |
sqlReaderQuery | Use the custom SQL query to read data. Example: select * from MyTable . |
No |
sqlReaderStoredProcedureName | Name of the stored procedure that reads data from the source table. The last SQL statement must be a SELECT statement in the stored procedure. | No |
storedProcedureParameters | Parameters for the stored procedure. Allowed values are: name/value pairs. Names and casing of parameters must match the names and casing of the stored procedure parameters. |
No |
Points to note
- If the sqlReaderQuery is specified for the SqlSource, the Copy Activity runs this query against the Managed Instance source to get the data. Alternatively, you can specify a stored procedure by specifying the sqlReaderStoredProcedureName and storedProcedureParameters (if the stored procedure takes parameters).
- If you do not specify either "sqlReaderQuery" or "sqlReaderStoredProcedureName" property, the columns defined in the "structure" section of the dataset JSON are used to construct a query (
select column1, column2 from mytable
) to run against the Managed Instance. If the dataset definition does not have the "structure", all columns are selected from the table.
Example: Using a SQL query
"activities":[
{
"name": "CopyFromSQLServer",
"type": "Copy",
"inputs": [
{
"referenceName": "<Managed Instance input dataset name>",
"type": "DatasetReference"
}
],
"outputs": [
{
"referenceName": "<output dataset name>",
"type": "DatasetReference"
}
],
"typeProperties": {
"source": {
"type": "SqlSource",
"sqlReaderQuery": "SELECT * FROM MyTable"
},
"sink": {
"type": "<sink type>"
}
}
}
]
Example: Using a stored procedure
"activities":[
{
"name": "CopyFromSQLServer",
"type": "Copy",
"inputs": [
{
"referenceName": "<Managed Instance input dataset name>",
"type": "DatasetReference"
}
],
"outputs": [
{
"referenceName": "<output dataset name>",
"type": "DatasetReference"
}
],
"typeProperties": {
"source": {
"type": "SqlSource",
"sqlReaderStoredProcedureName": "CopyTestSrcStoredProcedureWithParameters",
"storedProcedureParameters": {
"stringData": { "value": "str3" },
"identifier": { "value": "$$Text.Format('{0:yyyy}', <datetime parameter>)", "type": "Int"}
}
},
"sink": {
"type": "<sink type>"
}
}
}
]
The stored procedure definition
CREATE PROCEDURE CopyTestSrcStoredProcedureWithParameters
(
@stringData varchar(20),
@identifier int
)
AS
SET NOCOUNT ON;
BEGIN
select *
from dbo.UnitTestSrcTable
where dbo.UnitTestSrcTable.stringData != stringData
and dbo.UnitTestSrcTable.identifier != identifier
END
GO
To copy data to Azure SQL Database Managed Instance, set the sink type in the copy activity to SqlSink. The following properties are supported in the copy activity sink section:
Property | Description | Required |
---|---|---|
type | The type property of the copy activity sink must be set to: SqlSink | Yes |
writeBatchSize | Inserts data into the SQL table when the buffer size reaches writeBatchSize. Allowed values are: integer (number of rows). |
No (default: 10000) |
writeBatchTimeout | Wait time for the batch insert operation to complete before it times out. Allowed values are: timespan. Example: “00:30:00” (30 minutes). |
No |
preCopyScript | Specify a SQL query for Copy Activity to execute before writing data into Managed Instance. It will only be invoked once per copy run. You can use this property to clean up the pre-loaded data. | No |
sqlWriterStoredProcedureName | Name of the stored procedure that defines how to apply source data into target table, e.g. to do upserts or transform using your own business logic. Note this stored procedure will be invoked per batch. If you want to do operation that only runs once and has nothing to do with source data e.g. delete/truncate, use preCopyScript property. |
No |
storedProcedureParameters | Parameters for the stored procedure. Allowed values are: name/value pairs. Names and casing of parameters must match the names and casing of the stored procedure parameters. |
No |
sqlWriterTableType | Specify a table type name to be used in the stored procedure. Copy activity makes the data being moved available in a temp table with this table type. Stored procedure code can then merge the data being copied with existing data. | No |
Tip
When copying data to Azure SQL Database Managed Instance, the copy activity appends data to the sink table by default. To perform an UPSERT or additional business logic, use the stored procedure in SqlSink. Learn more details from Invoking stored procedure for SQL Sink.
Example 1: Appending data
"activities":[
{
"name": "CopyToSQLServer",
"type": "Copy",
"inputs": [
{
"referenceName": "<input dataset name>",
"type": "DatasetReference"
}
],
"outputs": [
{
"referenceName": "<Managed Instance output dataset name>",
"type": "DatasetReference"
}
],
"typeProperties": {
"source": {
"type": "<source type>"
},
"sink": {
"type": "SqlSink",
"writeBatchSize": 100000
}
}
}
]
Example 2: Invoking a stored procedure during copy for upsert
Learn more details from Invoking stored procedure for SQL Sink.
"activities":[
{
"name": "CopyToSQLServer",
"type": "Copy",
"inputs": [
{
"referenceName": "<input dataset name>",
"type": "DatasetReference"
}
],
"outputs": [
{
"referenceName": "<Managed Instance output dataset name>",
"type": "DatasetReference"
}
],
"typeProperties": {
"source": {
"type": "<source type>"
},
"sink": {
"type": "SqlSink",
"sqlWriterStoredProcedureName": "CopyTestStoredProcedureWithParameters",
"sqlWriterTableType": "CopyTestTableType",
"storedProcedureParameters": {
"identifier": { "value": "1", "type": "Int" },
"stringData": { "value": "str1" }
}
}
}
}
]
This section provides an example that copies data from a source table with no identity column to a destination table with an identity column.
Source table
create table dbo.SourceTbl
(
name varchar(100),
age int
)
Destination table
create table dbo.TargetTbl
(
identifier int identity(1,1),
name varchar(100),
age int
)
Notice that the target table has an identity column.
Source dataset JSON definition
{
"name": "SampleSource",
"properties": {
"type": " SqlServerTable",
"linkedServiceName": {
"referenceName": "TestIdentitySQL",
"type": "LinkedServiceReference"
},
"typeProperties": {
"tableName": "SourceTbl"
}
}
}
Destination dataset JSON definition
{
"name": "SampleTarget",
"properties": {
"structure": [
{ "name": "name" },
{ "name": "age" }
],
"type": "SqlServerTable",
"linkedServiceName": {
"referenceName": "TestIdentitySQL",
"type": "LinkedServiceReference"
},
"typeProperties": {
"tableName": "TargetTbl"
}
}
}
Notice that as your source and target table have different schema (target has an additional column with identity). In this scenario, you need to specify structure property in the target dataset definition, which doesn’t include the identity column.
When copying data into Azure SQL Database Managed Instance, a user specified stored procedure could be configured and invoked with additional parameters.
A stored procedure can be used when built-in copy mechanisms do not serve the purpose. It is typically used when upsert (insert + update) or extra processing (merging columns, looking up additional values, insertion into multiple tables, etc.) needs to be done before the final insertion of source data in the destination table.
The following sample shows how to use a stored procedure to do an upsert into a table in the Managed Instance. Assuming input data and the sink "Marketing" table each has three columns: ProfileID, State, and Category. Perform upsert based on the “ProfileID” column and only apply for a specific category.
Output dataset
{
"name": "SQLServerDataset",
"properties":
{
"type": "SqlServerTable",
"linkedServiceName": {
"referenceName": "<Managed Instance linked service name>",
"type": "LinkedServiceReference"
},
"typeProperties": {
"tableName": "Marketing"
}
}
}
Define the SqlSink section in copy activity as follows.
"sink": {
"type": "SqlSink",
"SqlWriterTableType": "MarketingType",
"SqlWriterStoredProcedureName": "spOverwriteMarketing",
"storedProcedureParameters": {
"category": {
"value": "ProductA"
}
}
}
In your database, define the stored procedure with the same name as SqlWriterStoredProcedureName. It handles input data from your specified source, and merge into the output table. The parameter name of the table type in the stored procedure should be the same as the "tableName" defined in dataset.
CREATE PROCEDURE spOverwriteMarketing @Marketing [dbo].[MarketingType] READONLY, @category varchar(256)
AS
BEGIN
MERGE [dbo].[Marketing] AS target
USING @Marketing AS source
ON (target.ProfileID = source.ProfileID and target.Category = @category)
WHEN MATCHED THEN
UPDATE SET State = source.State
WHEN NOT MATCHED THEN
INSERT (ProfileID, State, Category)
VALUES (source.ProfileID, source.State, source.Category)
END
In your database, define the table type with the same name as sqlWriterTableType. Notice that the schema of the table type should be same as the schema returned by your input data.
CREATE TYPE [dbo].[MarketingType] AS TABLE(
[ProfileID] [varchar](256) NOT NULL,
[State] [varchar](256) NOT NULL,
[Category] [varchar](256) NOT NULL,
)
The stored procedure feature takes advantage of Table-Valued Parameters.
Note
If you write to Money/Smallmoney data type by invoking Stored Procedure, values may be rounded. Specify the corresponding data type in TVP as Decimal instead of Money/Smallmoney to mitigate.
When copying data from/to Azure SQL Database Managed Instance, the following mappings are used from Managed Instance data types to Azure Data Factory interim data types. See Schema and data type mappings to learn about how copy activity maps the source schema and data type to the sink.
Azure SQL Database Managed Instance data type | Data factory interim data type |
---|---|
bigint | Int64 |
binary | Byte[] |
bit | Boolean |
char | String, Char[] |
date | DateTime |
Datetime | DateTime |
datetime2 | DateTime |
Datetimeoffset | DateTimeOffset |
Decimal | Decimal |
FILESTREAM attribute (varbinary(max)) | Byte[] |
Float | Double |
image | Byte[] |
int | Int32 |
money | Decimal |
nchar | String, Char[] |
ntext | String, Char[] |
numeric | Decimal |
nvarchar | String, Char[] |
real | Single |
rowversion | Byte[] |
smalldatetime | DateTime |
smallint | Int16 |
smallmoney | Decimal |
sql_variant | Object * |
text | String, Char[] |
time | TimeSpan |
timestamp | Byte[] |
tinyint | Int16 |
uniqueidentifier | Guid |
varbinary | Byte[] |
varchar | String, Char[] |
xml | Xml |
For a list of data stores supported as sources and sinks by the copy activity in Azure Data Factory, see supported data stores.