title | description | services | documentationcenter | author | manager | ms.reviewer | ms.service | ms.workload | ms.tgt_pltfrm | ms.devlang | ms.topic | ms.date | ms.author |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Copy data from an HTTP source by using Azure Data Factory | Microsoft Docs |
Learn how to copy data from a cloud or on-premises HTTP source to supported sink data stores by using a copy activity in an Azure Data Factory pipeline. |
data-factory |
linda33wj |
craigg |
douglasl |
data-factory |
data-services |
na |
na |
conceptual |
08/24/2018 |
jingwang |
[!div class="op_single_selector" title1="Select the version of Data Factory service you are using:"]
This article outlines how to use Copy Activity in Azure Data Factory to copy data from an HTTP endpoint. The article builds on Copy Activity in Azure Data Factory, which presents a general overview of Copy Activity.
You can copy data from an HTTP source to any supported sink data store. For a list of data stores that Copy Activity supports as sources and sinks, see Supported data stores and formats.
You can use this HTTP connector to:
- Retrieve data from an HTTP/S endpoint by using the HTTP GET or POST methods.
- Retrieve data by using one of the following authentications: Anonymous, Basic, Digest, Windows, or ClientCertificate.
- Copy the HTTP response as-is or parse it by using supported file formats and compression codecs.
The difference between this connector and the Web table connector is that the Web table connector extracts table content from an HTML webpage.
Tip
To test an HTTP request for data retrieval before you configure the HTTP connector in Data Factory, learn about the API specification for header and body requirements. You can use tools like Postman or a web browser to validate.
[!INCLUDE data-factory-v2-connector-get-started]
The following sections provide details about properties you can use to define Data Factory entities that are specific to the HTTP connector.
The following properties are supported for the HTTP linked service:
Property | Description | Required |
---|---|---|
type | The type property must be set to HttpServer. | Yes |
url | The base URL to the web server. | Yes |
enableServerCertificateValidation | Specify whether to enable server SSL certificate validation when you connect to an HTTP endpoint. If your HTTPS server uses a self-signed certificate, set this property to false. | No (the default is true) |
authenticationType | Specifies the authentication type. Allowed values are Anonymous, Basic, Digest, Windows, and ClientCertificate. See the sections that follow this table for more properties and JSON samples for these authentication types. |
Yes |
connectVia | The Integration Runtime to use to connect to the data store. You can use the Azure Integration Runtime or a self-hosted Integration Runtime (if your data store is located in a private network). If not specified, this property uses the default Azure Integration Runtime. | No |
Set the authenticationType property to Basic, Digest, or Windows. In addition to the generic properties that are described in the preceding section, specify the following properties:
Property | Description | Required |
---|---|---|
userName | The user name to use to access the HTTP endpoint. | Yes |
password | The password for the user (the userName value). Mark this field as a SecureString type to store it securely in Data Factory. You can also reference a secret stored in Azure Key Vault. | Yes |
Example
{
"name": "HttpLinkedService",
"properties": {
"type": "HttpServer",
"typeProperties": {
"authenticationType": "Basic",
"url" : "<HTTP endpoint>",
"userName": "<user name>",
"password": {
"type": "SecureString",
"value": "<password>"
}
},
"connectVia": {
"referenceName": "<name of Integration Runtime>",
"type": "IntegrationRuntimeReference"
}
}
}
To use ClientCertificate authentication, set the authenticationType property to ClientCertificate. In addition to the generic properties that are described in the preceding section, specify the following properties:
Property | Description | Required |
---|---|---|
embeddedCertData | Base64-encoded certificate data. | Specify either embeddedCertData or certThumbprint. |
certThumbprint | The thumbprint of the certificate that's installed on your self-hosted Integration Runtime machine's cert store. Applies only when the self-hosted type of Integration Runtime is specified in the connectVia property. | Specify either embeddedCertData or certThumbprint. |
password | The password that's associated with the certificate. Mark this field as a SecureString type to store it securely in Data Factory. You can also reference a secret stored in Azure Key Vault. | No |
If you use certThumbprint for authentication and the certificate is installed in the personal store of the local computer, grant read permissions to the self-hosted Integration Runtime:
- Open the Microsoft Management Console (MMC). Add the Certificates snap-in that targets Local Computer.
- Expand Certificates > Personal, and then select Certificates.
- Right-click the certificate from the personal store, and then select All Tasks > Manage Private Keys.
- On the Security tab, add the user account under which the Integration Runtime Host Service (DIAHostService) is running, with read access to the certificate.
Example 1: Using certThumbprint
{
"name": "HttpLinkedService",
"properties": {
"type": "HttpServer",
"typeProperties": {
"authenticationType": "ClientCertificate",
"url": "<HTTP endpoint>",
"certThumbprint": "<thumbprint of certificate>"
},
"connectVia": {
"referenceName": "<name of Integration Runtime>",
"type": "IntegrationRuntimeReference"
}
}
}
Example 2: Using embeddedCertData
{
"name": "HttpLinkedService",
"properties": {
"type": "HttpServer",
"typeProperties": {
"authenticationType": "ClientCertificate",
"url": "<HTTP endpoint>",
"embeddedCertData": "<Base64-encoded cert data>",
"password": {
"type": "SecureString",
"value": "password of cert"
}
},
"connectVia": {
"referenceName": "<name of Integration Runtime>",
"type": "IntegrationRuntimeReference"
}
}
}
This section provides a list of properties that the HTTP dataset supports.
For a full list of sections and properties that are available for defining datasets, see Datasets and linked services.
To copy data from HTTP, set the type property of the dataset to HttpFile. The following properties are supported:
Property | Description | Required |
---|---|---|
type | The type property of the dataset must be set to HttpFile. | Yes |
relativeUrl | A relative URL to the resource that contains the data. When this property isn't specified, only the URL that's specified in the linked service definition is used. | No |
requestMethod | The HTTP method. Allowed values are Get (default) and Post. | No |
additionalHeaders | Additional HTTP request headers. | No |
requestBody | The body for the HTTP request. | No |
format | If you want to retrieve data from the HTTP endpoint as-is without parsing it, and then copy the data to a file-based store, skip the format section in both the input and output dataset definitions. If you want to parse the HTTP response content during copy, the following file format types are supported: TextFormat, JsonFormat, AvroFormat, OrcFormat, and ParquetFormat. Under format, set the type property to one of these values. For more information, see JSON format, Text format, Avro format, Orc format, and Parquet format. |
No |
compression | Specify the type and level of compression for the data. For more information, see Supported file formats and compression codecs. Supported types: GZip, Deflate, BZip2, and ZipDeflate. Supported levels: Optimal and Fastest. |
No |
Note
The supported HTTP request payload size is around 500 KB. If the payload size you want to pass to your web endpoint is larger than 500 KB, consider batching the payload in smaller chunks.
Example 1: Using the Get method (default)
{
"name": "HttpSourceDataInput",
"properties": {
"type": "HttpFile",
"linkedServiceName": {
"referenceName": "<HTTP linked service name>",
"type": "LinkedServiceReference"
},
"typeProperties": {
"relativeUrl": "<relative url>",
"additionalHeaders": "Connection: keep-alive\nUser-Agent: Mozilla/5.0\n"
}
}
}
Example 2: Using the Post method
{
"name": "HttpSourceDataInput",
"properties": {
"type": "HttpFile",
"linkedServiceName": {
"referenceName": "<HTTP linked service name>",
"type": "LinkedServiceReference"
},
"typeProperties": {
"relativeUrl": "<relative url>",
"requestMethod": "Post",
"requestBody": "<body for POST HTTP request>"
}
}
}
This section provides a list of properties that the HTTP source supports.
For a full list of sections and properties that are available for defining activities, see Pipelines.
To copy data from HTTP, set source type in the copy activity to HttpSource. The following properties are supported in the copy activity source section:
Property | Description | Required |
---|---|---|
type | The type property of the copy activity source must be set to HttpSource. | Yes |
httpRequestTimeout | The timeout (the TimeSpan value) for the HTTP request to get a response. This value is the timeout to get a response, not the timeout to read response data. The default value is 00:01:40. | No |
Example
"activities":[
{
"name": "CopyFromHTTP",
"type": "Copy",
"inputs": [
{
"referenceName": "<HTTP input dataset name>",
"type": "DatasetReference"
}
],
"outputs": [
{
"referenceName": "<output dataset name>",
"type": "DatasetReference"
}
],
"typeProperties": {
"source": {
"type": "HttpSource",
"httpRequestTimeout": "00:01:00"
},
"sink": {
"type": "<sink type>"
}
}
}
]
For a list of data stores that Copy Activity supports as sources and sinks in Azure Data Factory, see Supported data stores and formats.