title | description | services | documentationcenter | author | manager | ms.assetid | ms.service | ms.workload | ms.tgt_pltfrm | ms.devlang | ms.topic | ms.date | ms.author | robots |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Move data from Salesforce by using Data Factory | Microsoft Docs |
Learn about how to move data from Salesforce by using Azure Data Factory. |
data-factory |
linda33wj |
craigg |
dbe3bfd6-fa6a-491a-9638-3a9a10d396d1 |
data-factory |
data-services |
na |
na |
conceptual |
07/18/2018 |
jingwang |
noindex |
[!div class="op_single_selector" title1="Select the version of Data Factory service you are using:"]
Note
This article applies to version 1 of Data Factory. If you are using the current version of the Data Factory service, see Salesforce connector in V2.
This article outlines how you can use Copy Activity in an Azure data factory to copy data from Salesforce to any data store that is listed under the Sink column in the supported sources and sinks table. This article builds on the data movement activities article, which presents a general overview of data movement with Copy Activity and supported data store combinations.
Azure Data Factory currently supports only moving data from Salesforce to supported sink data stores, but does not support moving data from other data stores to Salesforce.
This connector supports the following editions of Salesforce: Developer Edition, Professional Edition, Enterprise Edition, or Unlimited Edition. And it supports copying from Salesforce production, sandbox and custom domain.
- API permission must be enabled. See How do I enable API access in Salesforce by permission set?
- To copy data from Salesforce to on-premises data stores, you must have at least Data Management Gateway 2.0 installed in your on-premises environment.
Salesforce has limits for both total API requests and concurrent API requests. Note the following points:
- If the number of concurrent requests exceeds the limit, throttling occurs and you will see random failures.
- If the total number of requests exceeds the limit, the Salesforce account will be blocked for 24 hours.
You might also receive the “REQUEST_LIMIT_EXCEEDED“ error in both scenarios. See the "API Request Limits" section in the Salesforce Developer Limits article for details.
You can create a pipeline with a copy activity that moves data from Salesforce by using different tools/APIs.
The easiest way to create a pipeline is to use the Copy Wizard. See Tutorial: Create a pipeline using Copy Wizard for a quick walkthrough on creating a pipeline using the Copy data wizard.
You can also use the following tools to create a pipeline: Azure portal, Visual Studio, Azure PowerShell, Azure Resource Manager template, .NET API, and REST API. See Copy activity tutorial for step-by-step instructions to create a pipeline with a copy activity.
Whether you use the tools or APIs, you perform the following steps to create a pipeline that moves data from a source data store to a sink data store:
- Create linked services to link input and output data stores to your data factory.
- Create datasets to represent input and output data for the copy operation.
- Create a pipeline with a copy activity that takes a dataset as an input and a dataset as an output.
When you use the wizard, JSON definitions for these Data Factory entities (linked services, datasets, and the pipeline) are automatically created for you. When you use tools/APIs (except .NET API), you define these Data Factory entities by using the JSON format. For a sample with JSON definitions for Data Factory entities that are used to copy data from Salesforce, see JSON example: Copy data from Salesforce to Azure Blob section of this article.
The following sections provide details about JSON properties that are used to define Data Factory entities specific to Salesforce:
The following table provides descriptions for JSON elements that are specific to the Salesforce linked service.
Property | Description | Required |
---|---|---|
type | The type property must be set to: Salesforce. | Yes |
environmentUrl | Specify the URL of Salesforce instance. - Default is "https://login.salesforce.com". - To copy data from sandbox, specify "https://test.salesforce.com". - To copy data from custom domain, specify, for example, "https://[domain].my.salesforce.com". |
No |
username | Specify a user name for the user account. | Yes |
password | Specify a password for the user account. | Yes |
securityToken | Specify a security token for the user account. See Get security token for instructions on how to reset/get a security token. To learn about security tokens in general, see Security and the API. | Yes |
For a full list of sections and properties that are available for defining datasets, see the Creating datasets article. Sections such as structure, availability, and policy of a dataset JSON are similar for all dataset types (Azure SQL, Azure blob, Azure table, and so on).
The typeProperties section is different for each type of dataset and provides information about the location of the data in the data store. The typeProperties section for a dataset of the type RelationalTable has the following properties:
Property | Description | Required |
---|---|---|
tableName | Name of the table in Salesforce. | No (if a query of RelationalSource is specified) |
Important
The "__c" part of the API Name is needed for any custom object.
For a full list of sections and properties that are available for defining activities, see the Creating pipelines article. Properties like name, description, input and output tables, and various policies are available for all types of activities.
The properties that are available in the typeProperties section of the activity, on the other hand, vary with each activity type. For Copy Activity, they vary depending on the types of sources and sinks.
In copy activity, when the source is of the type RelationalSource (which includes Salesforce), the following properties are available in typeProperties section:
Property | Description | Allowed values | Required |
---|---|---|---|
query | Use the custom query to read data. | A SQL-92 query or Salesforce Object Query Language (SOQL) query. For example: select * from MyTable__c . |
No (if the tableName of the dataset is specified) |
Important
The "__c" part of the API Name is needed for any custom object.
When specify the SOQL or SQL query, pay attention to the DateTime format difference. For example:
- SOQL sample:
$$Text.Format('SELECT Id, Name, BillingCity FROM Account WHERE LastModifiedDate >= {0:yyyy-MM-ddTHH:mm:ssZ} AND LastModifiedDate < {1:yyyy-MM-ddTHH:mm:ssZ}', WindowStart, WindowEnd)
- SQL sample:
- Using copy wizard to specify the query:
$$Text.Format('SELECT * FROM Account WHERE LastModifiedDate >= {{ts\'{0:yyyy-MM-dd HH:mm:ss}\'}} AND LastModifiedDate < {{ts\'{1:yyyy-MM-dd HH:mm:ss}\'}}', WindowStart, WindowEnd)
- Using JSON editing to specify the query (escape char properly):
$$Text.Format('SELECT * FROM Account WHERE LastModifiedDate >= {{ts\\'{0:yyyy-MM-dd HH:mm:ss}\\'}} AND LastModifiedDate < {{ts\\'{1:yyyy-MM-dd HH:mm:ss}\\'}}', WindowStart, WindowEnd)
- Using copy wizard to specify the query:
You can retrieve data from Salesforce reports by specifying query as {call "<report name>"}
,for example,. "query": "{call \"TestReport\"}"
.
To query the soft deleted records from Salesforce Recycle Bin, you can specify "IsDeleted = 1" in your query. For example,
- To query only the deleted records, specify "select * from MyTable__c where IsDeleted= 1"
- To query all the records including the existing and the deleted, specify "select * from MyTable__c where IsDeleted = 0 or IsDeleted = 1"
The following example provides sample JSON definitions that you can use to create a pipeline by using the Azure portal, Visual Studio, or Azure PowerShell. They show how to copy data from Salesforce to Azure Blob Storage. However, data can be copied to any of the sinks stated here using the Copy Activity in Azure Data Factory.
Here are the Data Factory artifacts that you'll need to create to implement the scenario. The sections that follow the list provide details about these steps.
- A linked service of the type Salesforce
- A linked service of the type AzureStorage
- An input dataset of the type RelationalTable
- An output dataset of the type AzureBlob
- A pipeline with Copy Activity that uses RelationalSource and BlobSink
Salesforce linked service
This example uses the Salesforce linked service. See the Salesforce linked service section for the properties that are supported by this linked service. See Get security token for instructions on how to reset/get the security token.
{
"name": "SalesforceLinkedService",
"properties":
{
"type": "Salesforce",
"typeProperties":
{
"username": "<user name>",
"password": "<password>",
"securityToken": "<security token>"
}
}
}
Azure Storage linked service
{
"name": "AzureStorageLinkedService",
"properties": {
"type": "AzureStorage",
"typeProperties": {
"connectionString": "DefaultEndpointsProtocol=https;AccountName=<accountname>;AccountKey=<accountkey>"
}
}
}
Salesforce input dataset
{
"name": "SalesforceInput",
"properties": {
"linkedServiceName": "SalesforceLinkedService",
"type": "RelationalTable",
"typeProperties": {
"tableName": "AllDataType__c"
},
"availability": {
"frequency": "Hour",
"interval": 1
},
"external": true,
"policy": {
"externalData": {
"retryInterval": "00:01:00",
"retryTimeout": "00:10:00",
"maximumRetry": 3
}
}
}
}
Setting external to true informs the Data Factory service that the dataset is external to the data factory and is not produced by an activity in the data factory.
Important
The "__c" part of the API Name is needed for any custom object.
Azure blob output dataset
Data is written to a new blob every hour (frequency: hour, interval: 1).
{
"name": "AzureBlobOutput",
"properties":
{
"type": "AzureBlob",
"linkedServiceName": "AzureStorageLinkedService",
"typeProperties":
{
"folderPath": "adfgetstarted/alltypes_c"
},
"availability":
{
"frequency": "Hour",
"interval": 1
}
}
}
Pipeline with Copy Activity
The pipeline contains Copy Activity, which is configured to use the input and output datasets, and is scheduled to run every hour. In the pipeline JSON definition, the source type is set to RelationalSource, and the sink type is set to BlobSink.
See RelationalSource type properties for the list of properties that are supported by the RelationalSource.
{
"name":"SamplePipeline",
"properties":{
"start":"2016-06-01T18:00:00",
"end":"2016-06-01T19:00:00",
"description":"pipeline with copy activity",
"activities":[
{
"name": "SalesforceToAzureBlob",
"description": "Copy from Salesforce to an Azure blob",
"type": "Copy",
"inputs": [
{
"name": "SalesforceInput"
}
],
"outputs": [
{
"name": "AzureBlobOutput"
}
],
"typeProperties": {
"source": {
"type": "RelationalSource",
"query": "SELECT Id, Col_AutoNumber__c, Col_Checkbox__c, Col_Currency__c, Col_Date__c, Col_DateTime__c, Col_Email__c, Col_Number__c, Col_Percent__c, Col_Phone__c, Col_Picklist__c, Col_Picklist_MultiSelect__c, Col_Text__c, Col_Text_Area__c, Col_Text_AreaLong__c, Col_Text_AreaRich__c, Col_URL__c, Col_Text_Encrypt__c, Col_Lookup__c FROM AllDataType__c"
},
"sink": {
"type": "BlobSink"
}
},
"scheduler": {
"frequency": "Hour",
"interval": 1
},
"policy": {
"concurrency": 1,
"executionPriorityOrder": "OldestFirst",
"retry": 0,
"timeout": "01:00:00"
}
}
]
}
}
Important
The "__c" part of the API Name is needed for any custom object.
Salesforce type | .NET-based type |
---|---|
Auto Number | String |
Checkbox | Boolean |
Currency | Decimal |
Date | DateTime |
Date/Time | DateTime |
String | |
Id | String |
Lookup Relationship | String |
Multi-Select Picklist | String |
Number | Decimal |
Percent | Decimal |
Phone | String |
Picklist | String |
Text | String |
Text Area | String |
Text Area (Long) | String |
Text Area (Rich) | String |
Text (Encrypted) | String |
URL | String |
Note
To map columns from source dataset to columns from sink dataset, see Mapping dataset columns in Azure Data Factory.
[!INCLUDE data-factory-structure-for-rectangualr-datasets]
See the Copy Activity performance and tuning guide to learn about key factors that impact performance of data movement (Copy Activity) in Azure Data Factory and various ways to optimize it.