title | description | services | documentationcenter | author | manager | ms.reviewer | ms.service | ms.workload | ms.tgt_pltfrm | ms.devlang | ms.topic | ms.date | ms.author |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Copy data to and from Azure Table storage by using Data Factory | Microsoft Docs |
Learn how to copy data from supported source stores to Azure Table storage, or from Table storage to supported sink stores, by using Data Factory. |
data-factory |
linda33wj |
craigg |
douglasl |
data-factory |
data-services |
na |
na |
conceptual |
08/17/2018 |
jingwang |
[!div class="op_single_selector" title1="Select the version of Data Factory service you are using:"]
This article outlines how to use Copy Activity in Azure Data Factory to copy data to and from Azure Table storage. It builds on the Copy Activity overview article that presents a general overview of Copy Activity.
You can copy data from any supported source data store to Table storage. You also can copy data from Table storage to any supported sink data store. For a list of data stores that are supported as sources or sinks by the copy activity, see the Supported data stores table.
Specifically, this Azure Table connector supports copying data by using account key and service shared access signature authentications.
[!INCLUDE data-factory-v2-connector-get-started]
The following sections provide details about properties that are used to define Data Factory entities specific to Table storage.
You can create an Azure Storage linked service by using the account key. It provides the data factory with global access to Storage. The following properties are supported.
Property | Description | Required |
---|---|---|
type | The type property must be set to AzureTableStorage. | Yes |
connectionString | Specify the information needed to connect to Storage for the connectionString property. Mark this field as a SecureString to store it securely in Data Factory, or reference a secret stored in Azure Key Vault. | Yes |
connectVia | The integration runtime to be used to connect to the data store. You can use Azure Integration Runtime or Self-hosted Integration Runtime (if your data store is located in a private network). If not specified, it uses the default Azure Integration Runtime. | No |
Note
If you were using "AzureStorage" type linked service, it is still supported as-is, while you are suggested to use this new "AzureTableStorage" linked service type going forward.
Example:
{
"name": "AzureStorageLinkedService",
"properties": {
"type": "AzureTableStorage",
"typeProperties": {
"connectionString": {
"type": "SecureString",
"value": "DefaultEndpointsProtocol=https;AccountName=<accountname>;AccountKey=<accountkey>"
}
},
"connectVia": {
"referenceName": "<name of Integration Runtime>",
"type": "IntegrationRuntimeReference"
}
}
}
You also can create a Storage linked service by using a shared access signature. It provides the data factory with restricted/time-bound access to all/specific resources in the storage.
A shared access signature provides delegated access to resources in your storage account. You can use it to grant a client limited permissions to objects in your storage account for a specified time and with a specified set of permissions. You don't have to share your account access keys. The shared access signature is a URI that encompasses in its query parameters all the information necessary for authenticated access to a storage resource. To access storage resources with the shared access signature, the client only needs to pass in the shared access signature to the appropriate constructor or method. For more information about shared access signatures, see Shared access signatures: Understand the shared access signature model.
Note
Data Factory now supports both service shared access signatures and account shared access signatures. For more information about these two types and how to construct them, see Types of shared access signatures.
Tip
To generate a service shared access signature for your storage account, you can execute the following PowerShell commands. Replace the placeholders and grant the needed permission.
$context = New-AzureStorageContext -StorageAccountName <accountName> -StorageAccountKey <accountKey>
New-AzureStorageContainerSASToken -Name <containerName> -Context $context -Permission rwdl -StartTime <startTime> -ExpiryTime <endTime> -FullUri
To use shared access signature authentication, the following properties are supported.
Property | Description | Required |
---|---|---|
type | The type property must be set to AzureTableStorage. | Yes |
sasUri | Specify the shared access signature URI to the Storage resources, such as blob, container, or table. Mark this field as a SecureString to store it securely in Data Factory, or reference a secret stored in Azure Key Vault. | Yes |
connectVia | The integration runtime to be used to connect to the data store. You can use the Azure Integration Runtime or the Self-hosted Integration Runtime (if your data store is located in a private network). If not specified, it uses the default Azure Integration Runtime. | No |
Note
If you were using "AzureStorage" type linked service, it is still supported as-is, while you are suggested to use this new "AzureTableStorage" linked service type going forward.
Example:
{
"name": "AzureStorageLinkedService",
"properties": {
"type": "AzureTableStorage",
"typeProperties": {
"sasUri": {
"type": "SecureString",
"value": "<SAS URI of the Azure Storage resource>"
}
},
"connectVia": {
"referenceName": "<name of Integration Runtime>",
"type": "IntegrationRuntimeReference"
}
}
}
When you create a shared access signature URI, consider the following points:
- Set appropriate read/write permissions on objects based on how the linked service (read, write, read/write) is used in your data factory.
- Set Expiry time appropriately. Make sure that the access to Storage objects doesn't expire within the active period of the pipeline.
- The URI should be created at the right table level based on the need.
For a full list of sections and properties available for defining datasets, see the Datasets article. This section provides a list of properties supported by the Azure Table dataset.
To copy data to and from Azure Table, set the type property of the dataset to AzureTable. The following properties are supported.
Property | Description | Required |
---|---|---|
type | The type property of the dataset must be set to AzureTable. | Yes |
tableName | The name of the table in the Table storage database instance that the linked service refers to. | Yes |
Example:
{
"name": "AzureTableDataset",
"properties":
{
"type": "AzureTable",
"linkedServiceName": {
"referenceName": "<Azure Table storage linked service name>",
"type": "LinkedServiceReference"
},
"typeProperties": {
"tableName": "MyTable"
}
}
}
For schema-free data stores such as Azure Table, Data Factory infers the schema in one of the following ways:
- If you specify the structure of data by using the structure property in the dataset definition, Data Factory honors this structure as the schema. In this case, if a row doesn't contain a value for a column, a null value is provided for it.
- If you don't specify the structure of data by using the structure property in the dataset definition, Data Factory infers the schema by using the first row in the data. In this case, if the first row doesn't contain the full schema, some columns are missed in the result of the copy operation.
For schema-free data sources, the best practice is to specify the structure of data by using the structure property.
For a full list of sections and properties available for defining activities, see the Pipelines article. This section provides a list of properties supported by the Azure Table source and sink.
To copy data from Azure Table, set the source type in the copy activity to AzureTableSource. The following properties are supported in the copy activity source section.
Property | Description | Required |
---|---|---|
type | The type property of the copy activity source must be set to AzureTableSource. | Yes |
azureTableSourceQuery | Use the custom Table storage query to read data. See examples in the following section. | No |
azureTableSourceIgnoreTableNotFound | Indicates whether to allow the exception of the table to not exist. Allowed values are True and False (default). |
No |
If the Azure Table column is of the datetime type:
"azureTableSourceQuery": "LastModifiedTime gt datetime'2017-10-01T00:00:00' and LastModifiedTime le datetime'2017-10-02T00:00:00'"
If the Azure Table column is of the string type:
"azureTableSourceQuery": "LastModifiedTime ge '201710010000_0000' and LastModifiedTime le '201710010000_9999'"
If you use the pipeline parameter, cast the datetime value to proper format according to the previous samples.
To copy data to Azure Table, set the sink type in the copy activity to AzureTableSink. The following properties are supported in the copy activity sink section.
Property | Description | Required |
---|---|---|
type | The type property of the copy activity sink must be set to AzureTableSink. | Yes |
azureTableDefaultPartitionKeyValue | The default partition key value that can be used by the sink. | No |
azureTablePartitionKeyName | Specify the name of the column whose values are used as partition keys. If not specified, "AzureTableDefaultPartitionKeyValue" is used as the partition key. | No |
azureTableRowKeyName | Specify the name of the column whose column values are used as the row key. If not specified, use a GUID for each row. | No |
azureTableInsertType | The mode to insert data into Azure Table. This property controls whether existing rows in the output table with matching partition and row keys have their values replaced or merged. Allowed values are merge (default) and replace. This setting applies at the row level not the table level. Neither option deletes rows in the output table that do not exist in the input. To learn about how the merge and replace settings work, see Insert or merge entity and Insert or replace entity. |
No |
writeBatchSize | Inserts data into Azure Table when writeBatchSize or writeBatchTimeout is hit. Allowed values are integer (number of rows). |
No (default is 10,000) |
writeBatchTimeout | Inserts data into Azure Table when writeBatchSize or writeBatchTimeout is hit. Allowed values are timespan. An example is "00:20:00" (20 minutes). |
No (default is 90 seconds, storage client's default timeout) |
Example:
"activities":[
{
"name": "CopyToAzureTable",
"type": "Copy",
"inputs": [
{
"referenceName": "<input dataset name>",
"type": "DatasetReference"
}
],
"outputs": [
{
"referenceName": "<Azure Table output dataset name>",
"type": "DatasetReference"
}
],
"typeProperties": {
"source": {
"type": "<source type>"
},
"sink": {
"type": "AzureTableSink",
"azureTablePartitionKeyName": "<column name>",
"azureTableRowKeyName": "<column name>"
}
}
}
]
Map a source column to a destination column by using the "translator" property before you can use the destination column as azureTablePartitionKeyName.
In the following example, source column DivisionID is mapped to the destination column DivisionID:
"translator": {
"type": "TabularTranslator",
"columnMappings": "DivisionID: DivisionID, FirstName: FirstName, LastName: LastName"
}
"DivisionID" is specified as the partition key.
"sink": {
"type": "AzureTableSink",
"azureTablePartitionKeyName": "DivisionID"
}
When you copy data from and to Azure Table, the following mappings are used from Azure Table data types to Data Factory interim data types. To learn about how the copy activity maps the source schema and data type to the sink, see Schema and data type mappings.
When you move data to and from Azure Table, the following mappings defined by Azure Table are used from Azure Table OData types to .NET type and vice versa.
Azure Table data type | Data Factory interim data type | Details |
---|---|---|
Edm.Binary | byte[] | An array of bytes up to 64 KB. |
Edm.Boolean | bool | A Boolean value. |
Edm.DateTime | DateTime | A 64-bit value expressed as Coordinated Universal Time (UTC). The supported DateTime range begins midnight, January 1, 1601 A.D. (C.E.), UTC. The range ends December 31, 9999. |
Edm.Double | double | A 64-bit floating point value. |
Edm.Guid | Guid | A 128-bit globally unique identifier. |
Edm.Int32 | Int32 | A 32-bit integer. |
Edm.Int64 | Int64 | A 64-bit integer. |
Edm.String | String | A UTF-16-encoded value. String values can be up to 64 KB. |
For a list of data stores supported as sources and sinks by the copy activity in Data Factory, see Supported data stores.