title | description | services | documentationcenter | author | manager | ms.assetid | ms.service | ms.workload | ms.tgt_pltfrm | ms.devlang | ms.topic | ms.date | ms.author | robots |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Move data to/from Azure Table | Microsoft Docs |
Learn how to move data to/from Azure Table Storage using Azure Data Factory. |
data-factory |
linda33wj |
craigg |
07b046b1-7884-4e57-a613-337292416319 |
data-factory |
data-services |
na |
na |
conceptual |
01/22/2018 |
jingwang |
noindex |
[!div class="op_single_selector" title1="Select the version of Data Factory service you are using:"]
Note
This article applies to version 1 of Data Factory. If you are using the current version of the Data Factory service, see Azure Table Storage connector in V2.
This article explains how to use the Copy Activity in Azure Data Factory to move data to/from Azure Table Storage. It builds on the Data Movement Activities article, which presents a general overview of data movement with the copy activity.
You can copy data from any supported source data store to Azure Table Storage or from Azure Table Storage to any supported sink data store. For a list of data stores supported as sources or sinks by the copy activity, see the Supported data stores table.
You can create a pipeline with a copy activity that moves data to/from an Azure Table Storage by using different tools/APIs.
The easiest way to create a pipeline is to use the Copy Wizard. See Tutorial: Create a pipeline using Copy Wizard for a quick walkthrough on creating a pipeline using the Copy data wizard.
You can also use the following tools to create a pipeline: Azure portal, Visual Studio, Azure PowerShell, Azure Resource Manager template, .NET API, and REST API. See Copy activity tutorial for step-by-step instructions to create a pipeline with a copy activity.
Whether you use the tools or APIs, you perform the following steps to create a pipeline that moves data from a source data store to a sink data store:
- Create linked services to link input and output data stores to your data factory.
- Create datasets to represent input and output data for the copy operation.
- Create a pipeline with a copy activity that takes a dataset as an input and a dataset as an output.
When you use the wizard, JSON definitions for these Data Factory entities (linked services, datasets, and the pipeline) are automatically created for you. When you use tools/APIs (except .NET API), you define these Data Factory entities by using the JSON format. For samples with JSON definitions for Data Factory entities that are used to copy data to/from an Azure Table Storage, see JSON examples section of this article.
The following sections provide details about JSON properties that are used to define Data Factory entities specific to Azure Table Storage:
There are two types of linked services you can use to link an Azure blob storage to an Azure data factory. They are: AzureStorage linked service and AzureStorageSas linked service. The Azure Storage linked service provides the data factory with global access to the Azure Storage. Whereas, The Azure Storage SAS (Shared Access Signature) linked service provides the data factory with restricted/time-bound access to the Azure Storage. There are no other differences between these two linked services. Choose the linked service that suits your needs. The following sections provide more details on these two linked services.
[!INCLUDE data-factory-azure-storage-linked-services]
For a full list of sections & properties available for defining datasets, see the Creating datasets article. Sections such as structure, availability, and policy of a dataset JSON are similar for all dataset types (Azure SQL, Azure blob, Azure table, etc.).
The typeProperties section is different for each type of dataset and provides information about the location of the data in the data store. The typeProperties section for the dataset of type AzureTable has the following properties.
Property | Description | Required |
---|---|---|
tableName | Name of the table in the Azure Table Database instance that linked service refers to. | Yes. When a tableName is specified without an azureTableSourceQuery, all records from the table are copied to the destination. If an azureTableSourceQuery is also specified, records from the table that satisfies the query are copied to the destination. |
For schema-free data stores such as Azure Table, the Data Factory service infers the schema in one of the following ways:
- If you specify the structure of data by using the structure property in the dataset definition, the Data Factory service honors this structure as the schema. In this case, if a row does not contain a value for a column, a null value is provided for it.
- If you don't specify the structure of data by using the structure property in the dataset definition, Data Factory infers the schema by using the first row in the data. In this case, if the first row does not contain the full schema, some columns are missed in the result of copy operation.
Therefore, for schema-free data sources, the best practice is to specify the structure of data using the structure property.
For a full list of sections & properties available for defining activities, see the Creating Pipelines article. Properties such as name, description, input and output datasets, and policies are available for all types of activities.
Properties available in the typeProperties section of the activity on the other hand vary with each activity type. For Copy activity, they vary depending on the types of sources and sinks.
AzureTableSource supports the following properties in typeProperties section:
Property | Description | Allowed values | Required |
---|---|---|---|
azureTableSourceQuery | Use the custom query to read data. | Azure table query string. See examples in the next section. | No. When a tableName is specified without an azureTableSourceQuery, all records from the table are copied to the destination. If an azureTableSourceQuery is also specified, records from the table that satisfies the query are copied to the destination. |
azureTableSourceIgnoreTableNotFound | Indicate whether swallow the exception of table not exist. | TRUE FALSE |
No |
If Azure Table column is of string type:
azureTableSourceQuery": "$$Text.Format('PartitionKey ge \\'{0:yyyyMMddHH00_0000}\\' and PartitionKey le \\'{0:yyyyMMddHH00_9999}\\'', SliceStart)"
If Azure Table column is of datetime type:
"azureTableSourceQuery": "$$Text.Format('DeploymentEndTime gt datetime\\'{0:yyyy-MM-ddTHH:mm:ssZ}\\' and DeploymentEndTime le datetime\\'{1:yyyy-MM-ddTHH:mm:ssZ}\\'', SliceStart, SliceEnd)"
AzureTableSink supports the following properties in typeProperties section:
Property | Description | Allowed values | Required |
---|---|---|---|
azureTableDefaultPartitionKeyValue | Default partition key value that can be used by the sink. | A string value. | No |
azureTablePartitionKeyName | Specify name of the column whose values are used as partition keys. If not specified, AzureTableDefaultPartitionKeyValue is used as the partition key. | A column name. | No |
azureTableRowKeyName | Specify name of the column whose column values are used as row key. If not specified, use a GUID for each row. | A column name. | No |
azureTableInsertType | The mode to insert data into Azure table. This property controls whether existing rows in the output table with matching partition and row keys have their values replaced or merged. To learn about how these settings (merge and replace) work, see Insert or Merge Entity and Insert or Replace Entity topics. This setting applies at the row level, not the table level, and neither option deletes rows in the output table that do not exist in the input. |
merge (default) replace |
No |
writeBatchSize | Inserts data into the Azure table when the writeBatchSize or writeBatchTimeout is hit. | Integer (number of rows) | No (default: 10000) |
writeBatchTimeout | Inserts data into the Azure table when the writeBatchSize or writeBatchTimeout is hit | timespan Example: “00:20:00” (20 minutes) |
No (Default to storage client default timeout value 90 sec) |
Map a source column to a destination column using the translator JSON property before you can use the destination column as the azureTablePartitionKeyName.
In the following example, source column DivisionID is mapped to the destination column: DivisionID.
"translator": {
"type": "TabularTranslator",
"columnMappings": "DivisionID: DivisionID, FirstName: FirstName, LastName: LastName"
}
The DivisionID is specified as the partition key.
"sink": {
"type": "AzureTableSink",
"azureTablePartitionKeyName": "DivisionID",
"writeBatchSize": 100,
"writeBatchTimeout": "01:00:00"
}
The following examples provide sample JSON definitions that you can use to create a pipeline by using Azure portal or Visual Studio or Azure PowerShell. They show how to copy data to and from Azure Table Storage and Azure Blob Database. However, data can be copied directly from any of the sources to any of the supported sinks. For more information, see the section "Supported data stores and formats" in Move data by using Copy Activity.
The following sample shows:
- A linked service of type AzureStorage (used for both table & blob).
- An input dataset of type AzureTable.
- An output dataset of type AzureBlob.
- The pipeline with Copy activity that uses AzureTableSource and BlobSink.
The sample copies data belonging to the default partition in an Azure Table to a blob every hour. The JSON properties used in these samples are described in sections following the samples.
Azure storage linked service:
{
"name": "StorageLinkedService",
"properties": {
"type": "AzureStorage",
"typeProperties": {
"connectionString": "DefaultEndpointsProtocol=https;AccountName=<accountname>;AccountKey=<accountkey>"
}
}
}
Azure Data Factory supports two types of Azure Storage linked services: AzureStorage and AzureStorageSas. For the first one, you specify the connection string that includes the account key and for the later one, you specify the Shared Access Signature (SAS) Uri. See Linked Services section for details.
Azure Table input dataset:
The sample assumes you have created a table “MyTable” in Azure Table.
Setting “external”: ”true” informs the Data Factory service that the dataset is external to the data factory and is not produced by an activity in the data factory.
{
"name": "AzureTableInput",
"properties": {
"type": "AzureTable",
"linkedServiceName": "StorageLinkedService",
"typeProperties": {
"tableName": "MyTable"
},
"external": true,
"availability": {
"frequency": "Hour",
"interval": 1
},
"policy": {
"externalData": {
"retryInterval": "00:01:00",
"retryTimeout": "00:10:00",
"maximumRetry": 3
}
}
}
}
Azure Blob output dataset:
Data is written to a new blob every hour (frequency: hour, interval: 1). The folder path for the blob is dynamically evaluated based on the start time of the slice that is being processed. The folder path uses year, month, day, and hours parts of the start time.
{
"name": "AzureBlobOutput",
"properties": {
"type": "AzureBlob",
"linkedServiceName": "StorageLinkedService",
"typeProperties": {
"folderPath": "mycontainer/myfolder/yearno={Year}/monthno={Month}/dayno={Day}/hourno={Hour}",
"partitionedBy": [
{
"name": "Year",
"value": {
"type": "DateTime",
"date": "SliceStart",
"format": "yyyy"
}
},
{
"name": "Month",
"value": {
"type": "DateTime",
"date": "SliceStart",
"format": "MM"
}
},
{
"name": "Day",
"value": {
"type": "DateTime",
"date": "SliceStart",
"format": "dd"
}
},
{
"name": "Hour",
"value": {
"type": "DateTime",
"date": "SliceStart",
"format": "HH"
}
}
],
"format": {
"type": "TextFormat",
"columnDelimiter": "\t",
"rowDelimiter": "\n"
}
},
"availability": {
"frequency": "Hour",
"interval": 1
}
}
}
Copy activity in a pipeline with AzureTableSource and BlobSink:
The pipeline contains a Copy Activity that is configured to use the input and output datasets and is scheduled to run every hour. In the pipeline JSON definition, the source type is set to AzureTableSource and sink type is set to BlobSink. The SQL query specified with AzureTableSourceQuery property selects the data from the default partition every hour to copy.
{
"name":"SamplePipeline",
"properties":{
"start":"2014-06-01T18:00:00",
"end":"2014-06-01T19:00:00",
"description":"pipeline for copy activity",
"activities":[
{
"name": "AzureTabletoBlob",
"description": "copy activity",
"type": "Copy",
"inputs": [
{
"name": "AzureTableInput"
}
],
"outputs": [
{
"name": "AzureBlobOutput"
}
],
"typeProperties": {
"source": {
"type": "AzureTableSource",
"AzureTableSourceQuery": "PartitionKey eq 'DefaultPartitionKey'"
},
"sink": {
"type": "BlobSink"
}
},
"scheduler": {
"frequency": "Hour",
"interval": 1
},
"policy": {
"concurrency": 1,
"executionPriorityOrder": "OldestFirst",
"retry": 0,
"timeout": "01:00:00"
}
}
]
}
}
The following sample shows:
- A linked service of type AzureStorage (used for both table & blob)
- An input dataset of type AzureBlob.
- An output dataset of type AzureTable.
- The pipeline with Copy activity that uses BlobSource and AzureTableSink.
The sample copies time-series data from an Azure blob to an Azure table hourly. The JSON properties used in these samples are described in sections following the samples.
Azure storage (for both Azure Table & Blob) linked service:
{
"name": "StorageLinkedService",
"properties": {
"type": "AzureStorage",
"typeProperties": {
"connectionString": "DefaultEndpointsProtocol=https;AccountName=<accountname>;AccountKey=<accountkey>"
}
}
}
Azure Data Factory supports two types of Azure Storage linked services: AzureStorage and AzureStorageSas. For the first one, you specify the connection string that includes the account key and for the later one, you specify the Shared Access Signature (SAS) Uri. See Linked Services section for details.
Azure Blob input dataset:
Data is picked up from a new blob every hour (frequency: hour, interval: 1). The folder path and file name for the blob are dynamically evaluated based on the start time of the slice that is being processed. The folder path uses year, month, and day part of the start time and file name uses the hour part of the start time. “external”: “true” setting informs the Data Factory service that the dataset is external to the data factory and is not produced by an activity in the data factory.
{
"name": "AzureBlobInput",
"properties": {
"type": "AzureBlob",
"linkedServiceName": "StorageLinkedService",
"typeProperties": {
"folderPath": "mycontainer/myfolder/yearno={Year}/monthno={Month}/dayno={Day}",
"fileName": "{Hour}.csv",
"partitionedBy": [
{
"name": "Year",
"value": {
"type": "DateTime",
"date": "SliceStart",
"format": "yyyy"
}
},
{
"name": "Month",
"value": {
"type": "DateTime",
"date": "SliceStart",
"format": "MM"
}
},
{
"name": "Day",
"value": {
"type": "DateTime",
"date": "SliceStart",
"format": "dd"
}
},
{
"name": "Hour",
"value": {
"type": "DateTime",
"date": "SliceStart",
"format": "HH"
}
}
],
"format": {
"type": "TextFormat",
"columnDelimiter": ",",
"rowDelimiter": "\n"
}
},
"external": true,
"availability": {
"frequency": "Hour",
"interval": 1
},
"policy": {
"externalData": {
"retryInterval": "00:01:00",
"retryTimeout": "00:10:00",
"maximumRetry": 3
}
}
}
}
Azure Table output dataset:
The sample copies data to a table named “MyTable” in Azure Table. Create an Azure table with the same number of columns as you expect the Blob CSV file to contain. New rows are added to the table every hour.
{
"name": "AzureTableOutput",
"properties": {
"type": "AzureTable",
"linkedServiceName": "StorageLinkedService",
"typeProperties": {
"tableName": "MyOutputTable"
},
"availability": {
"frequency": "Hour",
"interval": 1
}
}
}
Copy activity in a pipeline with BlobSource and AzureTableSink:
The pipeline contains a Copy Activity that is configured to use the input and output datasets and is scheduled to run every hour. In the pipeline JSON definition, the source type is set to BlobSource and sink type is set to AzureTableSink.
{
"name":"SamplePipeline",
"properties":{
"start":"2014-06-01T18:00:00",
"end":"2014-06-01T19:00:00",
"description":"pipeline with copy activity",
"activities":[
{
"name": "AzureBlobtoTable",
"description": "Copy Activity",
"type": "Copy",
"inputs": [
{
"name": "AzureBlobInput"
}
],
"outputs": [
{
"name": "AzureTableOutput"
}
],
"typeProperties": {
"source": {
"type": "BlobSource"
},
"sink": {
"type": "AzureTableSink",
"writeBatchSize": 100,
"writeBatchTimeout": "01:00:00"
}
},
"scheduler": {
"frequency": "Hour",
"interval": 1
},
"policy": {
"concurrency": 1,
"executionPriorityOrder": "OldestFirst",
"retry": 0,
"timeout": "01:00:00"
}
}
]
}
}
As mentioned in the data movement activities article, Copy activity performs automatic type conversions from source types to sink types with the following two-step approach.
- Convert from native source types to .NET type
- Convert from .NET type to native sink type
When moving data to & from Azure Table, the following mappings defined by Azure Table service are used from Azure Table OData types to .NET type and vice versa.
OData Data Type | .NET Type | Details |
---|---|---|
Edm.Binary | byte[] | An array of bytes up to 64 KB. |
Edm.Boolean | bool | A Boolean value. |
Edm.DateTime | DateTime | A 64-bit value expressed as Coordinated Universal Time (UTC). The supported DateTime range begins from 12:00 midnight, January 1, 1601 A.D. (C.E.), UTC. The range ends at December 31, 9999. |
Edm.Double | double | A 64-bit floating point value. |
Edm.Guid | Guid | A 128-bit globally unique identifier. |
Edm.Int32 | Int32 | A 32-bit integer. |
Edm.Int64 | Int64 | A 64-bit integer. |
Edm.String | String | A UTF-16-encoded value. String values may be up to 64 KB. |
The following sample is for copying data from an Azure Blob to Azure Table with type conversions.
Suppose the Blob dataset is in CSV format and contains three columns. One of them is a datetime column with a custom datetime format using abbreviated French names for day of the week.
Define the Blob Source dataset as follows along with type definitions for the columns.
{
"name": " AzureBlobInput",
"properties":
{
"structure":
[
{ "name": "userid", "type": "Int64"},
{ "name": "name", "type": "String"},
{ "name": "lastlogindate", "type": "Datetime", "culture": "fr-fr", "format": "ddd-MM-YYYY"}
],
"type": "AzureBlob",
"linkedServiceName": "StorageLinkedService",
"typeProperties": {
"folderPath": "mycontainer/myfolder",
"fileName":"myfile.csv",
"format":
{
"type": "TextFormat",
"columnDelimiter": ","
}
},
"external": true,
"availability":
{
"frequency": "Hour",
"interval": 1,
},
"policy": {
"externalData": {
"retryInterval": "00:01:00",
"retryTimeout": "00:10:00",
"maximumRetry": 3
}
}
}
}
Given the type mapping from Azure Table OData type to .NET type, you would define the table in Azure Table with the following schema.
Azure Table schema:
Column name | Type |
---|---|
userid | Edm.Int64 |
name | Edm.String |
lastlogindate | Edm.DateTime |
Next, define the Azure Table dataset as follows. You do not need to specify “structure” section with the type information since the type information is already specified in the underlying data store.
{
"name": "AzureTableOutput",
"properties": {
"type": "AzureTable",
"linkedServiceName": "StorageLinkedService",
"typeProperties": {
"tableName": "MyOutputTable"
},
"availability": {
"frequency": "Hour",
"interval": 1
}
}
}
In this case, Data Factory automatically does type conversions including the Datetime field with the custom datetime format using the "fr-fr" culture when moving data from Blob to Azure Table.
Note
To map columns from source dataset to columns from sink dataset, see Mapping dataset columns in Azure Data Factory.
To learn about key factors that impact performance of data movement (Copy Activity) in Azure Data Factory and various ways to optimize it, see Copy Activity Performance & Tuning Guide.