title	description	services	documentationcenter	author	manager	ms.assetid	ms.service	ms.workload	ms.tgt_pltfrm	ms.devlang	ms.topic	ms.date	ms.author	robots
Move data to/from Azure Table \| Microsoft Docs	Learn how to move data to/from Azure Table Storage using Azure Data Factory.	data-factory		linda33wj	craigg	07b046b1-7884-4e57-a613-337292416319	data-factory	data-services	na	na	conceptual	01/22/2018	jingwang	noindex

Move data to and from Azure Table using Azure Data Factory

[!div class="op_single_selector" title1="Select the version of Data Factory service you are using:"]

Version 1

Version 2 (current version)

Note

This article applies to version 1 of Data Factory. If you are using the current version of the Data Factory service, see Azure Table Storage connector in V2.

This article explains how to use the Copy Activity in Azure Data Factory to move data to/from Azure Table Storage. It builds on the Data Movement Activities article, which presents a general overview of data movement with the copy activity.

You can copy data from any supported source data store to Azure Table Storage or from Azure Table Storage to any supported sink data store. For a list of data stores supported as sources or sinks by the copy activity, see the Supported data stores table.

Getting started

You can create a pipeline with a copy activity that moves data to/from an Azure Table Storage by using different tools/APIs.

The easiest way to create a pipeline is to use the Copy Wizard. See Tutorial: Create a pipeline using Copy Wizard for a quick walkthrough on creating a pipeline using the Copy data wizard.

You can also use the following tools to create a pipeline: Azure portal, Visual Studio, Azure PowerShell, Azure Resource Manager template, .NET API, and REST API. See Copy activity tutorial for step-by-step instructions to create a pipeline with a copy activity.

Whether you use the tools or APIs, you perform the following steps to create a pipeline that moves data from a source data store to a sink data store:

Create linked services to link input and output data stores to your data factory.
Create datasets to represent input and output data for the copy operation.
Create a pipeline with a copy activity that takes a dataset as an input and a dataset as an output.

When you use the wizard, JSON definitions for these Data Factory entities (linked services, datasets, and the pipeline) are automatically created for you. When you use tools/APIs (except .NET API), you define these Data Factory entities by using the JSON format. For samples with JSON definitions for Data Factory entities that are used to copy data to/from an Azure Table Storage, see JSON examples section of this article.

The following sections provide details about JSON properties that are used to define Data Factory entities specific to Azure Table Storage:

Linked service properties

There are two types of linked services you can use to link an Azure blob storage to an Azure data factory. They are: AzureStorage linked service and AzureStorageSas linked service. The Azure Storage linked service provides the data factory with global access to the Azure Storage. Whereas, The Azure Storage SAS (Shared Access Signature) linked service provides the data factory with restricted/time-bound access to the Azure Storage. There are no other differences between these two linked services. Choose the linked service that suits your needs. The following sections provide more details on these two linked services.

[!INCLUDE data-factory-azure-storage-linked-services]

Dataset properties

For a full list of sections & properties available for defining datasets, see the Creating datasets article. Sections such as structure, availability, and policy of a dataset JSON are similar for all dataset types (Azure SQL, Azure blob, Azure table, etc.).

The typeProperties section is different for each type of dataset and provides information about the location of the data in the data store. The typeProperties section for the dataset of type AzureTable has the following properties.

Property	Description	Required
tableName	Name of the table in the Azure Table Database instance that linked service refers to.	Yes. When a tableName is specified without an azureTableSourceQuery, all records from the table are copied to the destination. If an azureTableSourceQuery is also specified, records from the table that satisfies the query are copied to the destination.

Schema by Data Factory

For schema-free data stores such as Azure Table, the Data Factory service infers the schema in one of the following ways:

If you specify the structure of data by using the structure property in the dataset definition, the Data Factory service honors this structure as the schema. In this case, if a row does not contain a value for a column, a null value is provided for it.
If you don't specify the structure of data by using the structure property in the dataset definition, Data Factory infers the schema by using the first row in the data. In this case, if the first row does not contain the full schema, some columns are missed in the result of copy operation.

Therefore, for schema-free data sources, the best practice is to specify the structure of data using the structure property.

Copy activity properties

For a full list of sections & properties available for defining activities, see the Creating Pipelines article. Properties such as name, description, input and output datasets, and policies are available for all types of activities.

Properties available in the typeProperties section of the activity on the other hand vary with each activity type. For Copy activity, they vary depending on the types of sources and sinks.

AzureTableSource supports the following properties in typeProperties section:

Property	Description	Allowed values	Required
azureTableSourceQuery	Use the custom query to read data.	Azure table query string. See examples in the next section.	No. When a tableName is specified without an azureTableSourceQuery, all records from the table are copied to the destination. If an azureTableSourceQuery is also specified, records from the table that satisfies the query are copied to the destination.
azureTableSourceIgnoreTableNotFound	Indicate whether swallow the exception of table not exist.	TRUE FALSE	No

azureTableSourceQuery examples

If Azure Table column is of string type:

azureTableSourceQuery": "$$Text.Format('PartitionKey ge \\'{0:yyyyMMddHH00_0000}\\' and PartitionKey le \\'{0:yyyyMMddHH00_9999}\\'', SliceStart)"

If Azure Table column is of datetime type:

"azureTableSourceQuery": "$$Text.Format('DeploymentEndTime gt datetime\\'{0:yyyy-MM-ddTHH:mm:ssZ}\\' and DeploymentEndTime le datetime\\'{1:yyyy-MM-ddTHH:mm:ssZ}\\'', SliceStart, SliceEnd)"

AzureTableSink supports the following properties in typeProperties section:

Property	Description	Allowed values	Required
azureTableDefaultPartitionKeyValue	Default partition key value that can be used by the sink.	A string value.	No
azureTablePartitionKeyName	Specify name of the column whose values are used as partition keys. If not specified, AzureTableDefaultPartitionKeyValue is used as the partition key.	A column name.	No
azureTableRowKeyName	Specify name of the column whose column values are used as row key. If not specified, use a GUID for each row.	A column name.	No
azureTableInsertType	The mode to insert data into Azure table. This property controls whether existing rows in the output table with matching partition and row keys have their values replaced or merged. To learn about how these settings (merge and replace) work, see Insert or Merge Entity and Insert or Replace Entity topics. This setting applies at the row level, not the table level, and neither option deletes rows in the output table that do not exist in the input.	merge (default) replace	No
writeBatchSize	Inserts data into the Azure table when the writeBatchSize or writeBatchTimeout is hit.	Integer (number of rows)	No (default: 10000)
writeBatchTimeout	Inserts data into the Azure table when the writeBatchSize or writeBatchTimeout is hit	timespan Example: “00:20:00” (20 minutes)	No (Default to storage client default timeout value 90 sec)

azureTablePartitionKeyName

Map a source column to a destination column using the translator JSON property before you can use the destination column as the azureTablePartitionKeyName.

In the following example, source column DivisionID is mapped to the destination column: DivisionID.

"translator": {
    "type": "TabularTranslator",
    "columnMappings": "DivisionID: DivisionID, FirstName: FirstName, LastName: LastName"
}

The DivisionID is specified as the partition key.

"sink": {
    "type": "AzureTableSink",
    "azureTablePartitionKeyName": "DivisionID",
    "writeBatchSize": 100,
    "writeBatchTimeout": "01:00:00"
}

JSON examples

The following examples provide sample JSON definitions that you can use to create a pipeline by using Azure portal or Visual Studio or Azure PowerShell. They show how to copy data to and from Azure Table Storage and Azure Blob Database. However, data can be copied directly from any of the sources to any of the supported sinks. For more information, see the section "Supported data stores and formats" in Move data by using Copy Activity.

Example: Copy data from Azure Table to Azure Blob

The following sample shows:

A linked service of type AzureStorage (used for both table & blob).
An input dataset of type AzureTable.
An output dataset of type AzureBlob.
The pipeline with Copy activity that uses AzureTableSource and BlobSink.

The sample copies data belonging to the default partition in an Azure Table to a blob every hour. The JSON properties used in these samples are described in sections following the samples.

Azure storage linked service:

{
  "name": "StorageLinkedService",
  "properties": {
    "type": "AzureStorage",
    "typeProperties": {
      "connectionString": "DefaultEndpointsProtocol=https;AccountName=<accountname>;AccountKey=<accountkey>"
    }
  }
}

Azure Data Factory supports two types of Azure Storage linked services: AzureStorage and AzureStorageSas. For the first one, you specify the connection string that includes the account key and for the later one, you specify the Shared Access Signature (SAS) Uri. See Linked Services section for details.

Azure Table input dataset:

The sample assumes you have created a table “MyTable” in Azure Table.

Setting “external”: ”true” informs the Data Factory service that the dataset is external to the data factory and is not produced by an activity in the data factory.

{
  "name": "AzureTableInput",
  "properties": {
    "type": "AzureTable",
    "linkedServiceName": "StorageLinkedService",
    "typeProperties": {
      "tableName": "MyTable"
    },
    "external": true,
    "availability": {
      "frequency": "Hour",
      "interval": 1
    },
    "policy": {
      "externalData": {
        "retryInterval": "00:01:00",
        "retryTimeout": "00:10:00",
        "maximumRetry": 3
      }
    }
  }
}

Azure Blob output dataset:

Data is written to a new blob every hour (frequency: hour, interval: 1). The folder path for the blob is dynamically evaluated based on the start time of the slice that is being processed. The folder path uses year, month, day, and hours parts of the start time.

{
  "name": "AzureBlobOutput",
  "properties": {
    "type": "AzureBlob",
    "linkedServiceName": "StorageLinkedService",
    "typeProperties": {
      "folderPath": "mycontainer/myfolder/yearno={Year}/monthno={Month}/dayno={Day}/hourno={Hour}",
      "partitionedBy": [
        {
          "name": "Year",
          "value": {
            "type": "DateTime",
            "date": "SliceStart",
            "format": "yyyy"
          }
        },
        {
          "name": "Month",
          "value": {
            "type": "DateTime",
            "date": "SliceStart",
            "format": "MM"
          }
        },
        {
          "name": "Day",
          "value": {
            "type": "DateTime",
            "date": "SliceStart",
            "format": "dd"
          }
        },
        {
          "name": "Hour",
          "value": {
            "type": "DateTime",
            "date": "SliceStart",
            "format": "HH"
          }
        }
      ],
      "format": {
        "type": "TextFormat",
        "columnDelimiter": "\t",
        "rowDelimiter": "\n"
      }
    },
    "availability": {
      "frequency": "Hour",
      "interval": 1
    }
  }
}

Copy activity in a pipeline with AzureTableSource and BlobSink:

The pipeline contains a Copy Activity that is configured to use the input and output datasets and is scheduled to run every hour. In the pipeline JSON definition, the source type is set to AzureTableSource and sink type is set to BlobSink. The SQL query specified with AzureTableSourceQuery property selects the data from the default partition every hour to copy.

{  
    "name":"SamplePipeline",
    "properties":{  
        "start":"2014-06-01T18:00:00",
        "end":"2014-06-01T19:00:00",
        "description":"pipeline for copy activity",
        "activities":[  
            {
                "name": "AzureTabletoBlob",
                "description": "copy activity",
                "type": "Copy",
                "inputs": [
                      {
                        "name": "AzureTableInput"
                    }
                ],
                "outputs": [
                      {
                            "name": "AzureBlobOutput"
                      }
                ],
                "typeProperties": {
                      "source": {
                        "type": "AzureTableSource",
                        "AzureTableSourceQuery": "PartitionKey eq 'DefaultPartitionKey'"
                      },
                      "sink": {
                        "type": "BlobSink"
                      }
                },
                "scheduler": {
                      "frequency": "Hour",
                      "interval": 1
                },                
                "policy": {
                      "concurrency": 1,
                      "executionPriorityOrder": "OldestFirst",
                      "retry": 0,
                      "timeout": "01:00:00"
                }
            }
         ]    
    }
}

Example: Copy data from Azure Blob to Azure Table

The following sample shows:

A linked service of type AzureStorage (used for both table & blob)
An input dataset of type AzureBlob.
An output dataset of type AzureTable.
The pipeline with Copy activity that uses BlobSource and AzureTableSink.

The sample copies time-series data from an Azure blob to an Azure table hourly. The JSON properties used in these samples are described in sections following the samples.

Azure storage (for both Azure Table & Blob) linked service:

{
  "name": "StorageLinkedService",
  "properties": {
    "type": "AzureStorage",
    "typeProperties": {
      "connectionString": "DefaultEndpointsProtocol=https;AccountName=<accountname>;AccountKey=<accountkey>"
    }
  }
}

Azure Data Factory supports two types of Azure Storage linked services: AzureStorage and AzureStorageSas. For the first one, you specify the connection string that includes the account key and for the later one, you specify the Shared Access Signature (SAS) Uri. See Linked Services section for details.

Azure Blob input dataset:

Data is picked up from a new blob every hour (frequency: hour, interval: 1). The folder path and file name for the blob are dynamically evaluated based on the start time of the slice that is being processed. The folder path uses year, month, and day part of the start time and file name uses the hour part of the start time. “external”: “true” setting informs the Data Factory service that the dataset is external to the data factory and is not produced by an activity in the data factory.

{
  "name": "AzureBlobInput",
  "properties": {
    "type": "AzureBlob",
    "linkedServiceName": "StorageLinkedService",
    "typeProperties": {
      "folderPath": "mycontainer/myfolder/yearno={Year}/monthno={Month}/dayno={Day}",
      "fileName": "{Hour}.csv",
      "partitionedBy": [
        {
          "name": "Year",
          "value": {
            "type": "DateTime",
            "date": "SliceStart",
            "format": "yyyy"
          }
        },
        {
          "name": "Month",
          "value": {
            "type": "DateTime",
            "date": "SliceStart",
            "format": "MM"
          }
        },
        {
          "name": "Day",
          "value": {
            "type": "DateTime",
            "date": "SliceStart",
            "format": "dd"
          }
        },
        {
          "name": "Hour",
          "value": {
            "type": "DateTime",
            "date": "SliceStart",
            "format": "HH"
          }
        }
      ],
      "format": {
        "type": "TextFormat",
        "columnDelimiter": ",",
        "rowDelimiter": "\n"
      }
    },
    "external": true,
    "availability": {
      "frequency": "Hour",
      "interval": 1
    },
    "policy": {
      "externalData": {
        "retryInterval": "00:01:00",
        "retryTimeout": "00:10:00",
        "maximumRetry": 3
      }
    }
  }
}

Azure Table output dataset:

The sample copies data to a table named “MyTable” in Azure Table. Create an Azure table with the same number of columns as you expect the Blob CSV file to contain. New rows are added to the table every hour.

{
  "name": "AzureTableOutput",
  "properties": {
    "type": "AzureTable",
    "linkedServiceName": "StorageLinkedService",
    "typeProperties": {
      "tableName": "MyOutputTable"
    },
    "availability": {
      "frequency": "Hour",
      "interval": 1
    }
  }
}

Copy activity in a pipeline with BlobSource and AzureTableSink:

The pipeline contains a Copy Activity that is configured to use the input and output datasets and is scheduled to run every hour. In the pipeline JSON definition, the source type is set to BlobSource and sink type is set to AzureTableSink.

{  
    "name":"SamplePipeline",
    "properties":{  
    "start":"2014-06-01T18:00:00",
    "end":"2014-06-01T19:00:00",
    "description":"pipeline with copy activity",
    "activities":[  
      {
        "name": "AzureBlobtoTable",
        "description": "Copy Activity",
        "type": "Copy",
        "inputs": [
          {
            "name": "AzureBlobInput"
          }
        ],
        "outputs": [
          {
            "name": "AzureTableOutput"
          }
        ],
        "typeProperties": {
          "source": {
            "type": "BlobSource"
          },
          "sink": {
            "type": "AzureTableSink",
            "writeBatchSize": 100,
            "writeBatchTimeout": "01:00:00"
          }
        },
        "scheduler": {
          "frequency": "Hour",
          "interval": 1
        },                        
        "policy": {
          "concurrency": 1,
          "executionPriorityOrder": "OldestFirst",
          "retry": 0,
          "timeout": "01:00:00"
        }
      }
      ]
   }
}

Type Mapping for Azure Table

As mentioned in the data movement activities article, Copy activity performs automatic type conversions from source types to sink types with the following two-step approach.

Convert from native source types to .NET type
Convert from .NET type to native sink type

When moving data to & from Azure Table, the following mappings defined by Azure Table service are used from Azure Table OData types to .NET type and vice versa.

OData Data Type	.NET Type	Details
Edm.Binary	byte[]	An array of bytes up to 64 KB.
Edm.Boolean	bool	A Boolean value.
Edm.DateTime	DateTime	A 64-bit value expressed as Coordinated Universal Time (UTC). The supported DateTime range begins from 12:00 midnight, January 1, 1601 A.D. (C.E.), UTC. The range ends at December 31, 9999.
Edm.Double	double	A 64-bit floating point value.
Edm.Guid	Guid	A 128-bit globally unique identifier.
Edm.Int32	Int32	A 32-bit integer.
Edm.Int64	Int64	A 64-bit integer.
Edm.String	String	A UTF-16-encoded value. String values may be up to 64 KB.

Type Conversion Sample

The following sample is for copying data from an Azure Blob to Azure Table with type conversions.

Suppose the Blob dataset is in CSV format and contains three columns. One of them is a datetime column with a custom datetime format using abbreviated French names for day of the week.

Define the Blob Source dataset as follows along with type definitions for the columns.

{
    "name": " AzureBlobInput",
    "properties":
    {
         "structure":
          [
                { "name": "userid", "type": "Int64"},
                { "name": "name", "type": "String"},
                { "name": "lastlogindate", "type": "Datetime", "culture": "fr-fr", "format": "ddd-MM-YYYY"}
          ],
        "type": "AzureBlob",
        "linkedServiceName": "StorageLinkedService",
        "typeProperties": {
            "folderPath": "mycontainer/myfolder",
            "fileName":"myfile.csv",
            "format":
            {
                "type": "TextFormat",
                "columnDelimiter": ","
            }
        },
        "external": true,
        "availability":
        {
            "frequency": "Hour",
            "interval": 1,
        },
        "policy": {
            "externalData": {
                "retryInterval": "00:01:00",
                "retryTimeout": "00:10:00",
                "maximumRetry": 3
            }
        }
    }
}

Given the type mapping from Azure Table OData type to .NET type, you would define the table in Azure Table with the following schema.

Azure Table schema:

Column name	Type
userid	Edm.Int64
name	Edm.String
lastlogindate	Edm.DateTime

Next, define the Azure Table dataset as follows. You do not need to specify “structure” section with the type information since the type information is already specified in the underlying data store.

{
  "name": "AzureTableOutput",
  "properties": {
    "type": "AzureTable",
    "linkedServiceName": "StorageLinkedService",
    "typeProperties": {
      "tableName": "MyOutputTable"
    },
    "availability": {
      "frequency": "Hour",
      "interval": 1
    }
  }
}

In this case, Data Factory automatically does type conversions including the Datetime field with the custom datetime format using the "fr-fr" culture when moving data from Blob to Azure Table.

Note

To map columns from source dataset to columns from sink dataset, see Mapping dataset columns in Azure Data Factory.

Performance and Tuning

To learn about key factors that impact performance of data movement (Copy Activity) in Azure Data Factory and various ways to optimize it, see Copy Activity Performance & Tuning Guide.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data-factory-azure-table-connector.md

data-factory-azure-table-connector.md

Move data to and from Azure Table using Azure Data Factory

Getting started

Linked service properties

Dataset properties

Schema by Data Factory

Copy activity properties

azureTableSourceQuery examples

azureTablePartitionKeyName

JSON examples

Example: Copy data from Azure Table to Azure Blob

Example: Copy data from Azure Blob to Azure Table

Type Mapping for Azure Table

Type Conversion Sample

Performance and Tuning

Files

data-factory-azure-table-connector.md

Latest commit

History

data-factory-azure-table-connector.md

File metadata and controls

Move data to and from Azure Table using Azure Data Factory

Getting started

Linked service properties

Dataset properties

Schema by Data Factory

Copy activity properties

azureTableSourceQuery examples

azureTablePartitionKeyName

JSON examples

Example: Copy data from Azure Table to Azure Blob

Example: Copy data from Azure Blob to Azure Table

Type Mapping for Azure Table

Type Conversion Sample

Performance and Tuning