title | description | services | documentationcenter | author | manager | ms.reviewer | ms.service | ms.workload | ms.tgt_pltfrm | ms.devlang | ms.topic | ms.date | ms.author |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Copy data from Web Table using Azure Data Factory | Microsoft Docs |
Learn about Web Table Connector of Azure Data Factory that lets you copy data from a web table to data stores supported by Data Factory as sinks. |
data-factory |
linda33wj |
craigg |
douglasl |
data-factory |
data-services |
na |
na |
conceptual |
04/28/2018 |
jingwang |
[!div class="op_single_selector" title1="Select the version of Data Factory service you are using:"]
This article outlines how to use the Copy Activity in Azure Data Factory to copy data from a Web table database. It builds on the copy activity overview article that presents a general overview of copy activity.
You can copy data from Web table database to any supported sink data store. For a list of data stores that are supported as sources/sinks by the copy activity, see the Supported data stores table.
Specifically, this Web table connector supports extracting table content from an HTML page. To retrieve data from a HTTP/s endpoint, use HTTP connector instead.
To use this Web table connector, you need to set up a Self-hosted Integration Runtime. See Self-hosted Integration Runtime article for details.
[!INCLUDE data-factory-v2-connector-get-started]
The following sections provide details about properties that are used to define Data Factory entities specific to Web table connector.
The following properties are supported for Web table linked service:
Property | Description | Required |
---|---|---|
type | The type property must be set to: Web | Yes |
url | URL to the Web source | Yes |
authenticationType | Allowed value is: Anonymous. | Yes |
connectVia | The Integration Runtime to be used to connect to the data store. A Self-hosted Integration Runtime is required as mentioned in Prerequisites. | Yes |
Example:
{
"name": "WebLinkedService",
"properties": {
"type": "Web",
"typeProperties": {
"url" : "https://en.wikipedia.org/wiki/",
"authenticationType": "Anonymous"
},
"connectVia": {
"referenceName": "<name of Integration Runtime>",
"type": "IntegrationRuntimeReference"
}
}
}
For a full list of sections and properties available for defining datasets, see the datasets article. This section provides a list of properties supported by Web table dataset.
To copy data from Web table, set the type property of the dataset to WebTable. The following properties are supported:
Property | Description | Required |
---|---|---|
type | The type property of the dataset must be set to: WebTable | Yes |
path | A relative URL to the resource that contains the table. | No. When path is not specified, only the URL specified in the linked service definition is used. |
index | The index of the table in the resource. See Get index of a table in an HTML page section for steps to getting index of a table in an HTML page. | Yes |
Example:
{
"name": "WebTableInput",
"properties": {
"type": "WebTable",
"linkedServiceName": {
"referenceName": "<Web linked service name>",
"type": "LinkedServiceReference"
},
"typeProperties": {
"index": 1,
"path": "AFI's_100_Years...100_Movies"
}
}
}
For a full list of sections and properties available for defining activities, see the Pipelines article. This section provides a list of properties supported by Web table source.
To copy data from Web table, set the source type in the copy activity to WebSource, no additional properties are supported.
Example:
"activities":[
{
"name": "CopyFromWebTable",
"type": "Copy",
"inputs": [
{
"referenceName": "<Web table input dataset name>",
"type": "DatasetReference"
}
],
"outputs": [
{
"referenceName": "<output dataset name>",
"type": "DatasetReference"
}
],
"typeProperties": {
"source": {
"type": "WebSource"
},
"sink": {
"type": "<sink type>"
}
}
}
]
To get the index of a table which you need to configure in dataset properties, you can use e.g. Excel 2016 as the tool as follows:
-
Launch Excel 2016 and switch to the Data tab.
-
Click New Query on the toolbar, point to From Other Sources and click From Web.
-
In the From Web dialog box, enter URL that you would use in linked service JSON (for example: https://en.wikipedia.org/wiki/) along with path you would specify for the dataset (for example: AFI%27s_100_Years...100_Movies), and click OK.
URL used in this example: https://en.wikipedia.org/wiki/AFI%27s_100_Years...100_Movies
-
If you see Access Web content dialog box, select the right URL, authentication, and click Connect.
-
Click a table item in the tree view to see content from the table and then click Edit button at the bottom.
-
In the Query Editor window, click Advanced Editor button on the toolbar.
-
In the Advanced Editor dialog box, the number next to "Source" is the index.
If you are using Excel 2013, use Microsoft Power Query for Excel to get the index. See Connect to a web page article for details. The steps are similar if you are using Microsoft Power BI for Desktop.
For a list of data stores supported as sources and sinks by the copy activity in Azure Data Factory, see supported data stores.