Skip to content

Commit

Permalink
Merge branch 'master' of https://github.com/Azure/azure-content-pr in…
Browse files Browse the repository at this point in the history
…to working-branch-2016-05-06
  • Loading branch information
steelanddata committed May 5, 2016
2 parents 8f80edd + 2992035 commit 0289845
Show file tree
Hide file tree
Showing 231 changed files with 1,392 additions and 2,369 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
ms.tgt_pltfrm="na"
ms.devlang="na"
ms.topic="article"
ms.date="03/24/2016"
ms.date="04/05/2016"
ms.author="onewth"/>

# Getting started with the Text Analytics APIs to detect sentiment, key phrases, topics and language
Expand All @@ -25,7 +25,7 @@ You can use these APIs to detect sentiment, key phrases, topics and language fro

Please refer to the [API definitions](//go.microsoft.com/fwlink/?LinkID=759346) for technical documentation for the APIs.

This guide is for version 2 of the APIs. For details on version 1 of the APIs, [refer to this document](machine-learning-apps-text-analytics/).
This guide is for version 2 of the APIs. For details on version 1 of the APIs, [refer to this document](../machine-learning-apps-text-analytics/).

By the end of this tutorial, you will be able to programatically detect:

Expand Down Expand Up @@ -237,6 +237,50 @@ Follow these steps to detect topics in your text.
}
}

Note that the successful response for topics from the `operations` endpoint will have the following schema:

{
"topics" : [{
"id" : "string",
"score" : "number",
"keyPhrase" : "string"
}],
"topicAssignments" : [{
"documentId" : "string",
"topicId" : "string",
"distance" : "number"
}],
"errors" : [{
"id" : "string",
"message" : "string"
}]
}

Explanations for each part of this response are as follows:

**topics**

| Key | Description |
|:-----|:----|
| id | A unique identifier for each topic. |
| score | Count of documents assigned to topic. |
| keyPhrase | A summarizing word or phrase for the topic. |

**topicAssignments**

| Key | Description |
|:-----|:----|
| documentId | Identifier for the document. Equates to the ID included in the input. |
| topicId | The topic ID which the document has been assigned to. |
| distance | Document-to-topic affiliation score between 0 and 1. The lower a distance score the stronger the topic affiliation is. |

**errors**

| Key | Description |
|:-----|:----|
| id | Input document unique identifier the error refers to. |
| message | Error message. |

## Next steps ##

Congratulations! You have now completed using text analytics on your data. You may now wish to look into using a tool such as [Power BI](//powerbi.microsoft.com) to visualize your data, as well as automating your insights to give you a real-time view of your text data.
4 changes: 2 additions & 2 deletions articles/data-catalog/data-catalog-common-scenarios.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
<properties
<properties
pageTitle="Azure Data Catalog common scenarios | Microsoft Azure"
description="An overview of common scenarios for Azure Data Catalog, including the registration and discovery of high-value data sources, enabling self-service business intelligence, and capturing existing tribal knowledge about data sources and processes."
services="data-catalog"
Expand All @@ -10,7 +10,7 @@
<tags
ms.service="data-catalog"
ms.devlang="NA"
ms.topic="get-started-article"
ms.topic="article"
ms.tgt_pltfrm="NA"
ms.workload="data-catalog"
ms.date="03/31/2016"
Expand Down
4 changes: 2 additions & 2 deletions articles/data-catalog/data-catalog-how-to-annotate.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
<properties
<properties
pageTitle="How to annotate data sources | Microsoft Azure"
description="How-to article highlighting how to annotate data assets in Azure Data Catalog, including friendly names, tags, descriptions, and experts."
services="data-catalog"
Expand All @@ -10,7 +10,7 @@
<tags
ms.service="data-catalog"
ms.devlang="NA"
ms.topic="get-started-article"
ms.topic="article"
ms.tgt_pltfrm="NA"
ms.workload="data-catalog"
ms.date="03/31/2016"
Expand Down
2 changes: 1 addition & 1 deletion articles/data-catalog/data-catalog-how-to-big-data.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
<tags
ms.service="data-catalog"
ms.devlang="NA"
ms.topic="get-started-article"
ms.topic="article"
ms.tgt_pltfrm="NA"
ms.workload="data-catalog"
ms.date="03/31/2016"
Expand Down
4 changes: 2 additions & 2 deletions articles/data-catalog/data-catalog-how-to-connect.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
<properties
<properties
pageTitle="How to connect to data sources | Microsoft Azure"
description="How-to article highlighting how to connect to data sources discovered with Azure Data Catalog."
services="data-catalog"
Expand All @@ -10,7 +10,7 @@
<tags
ms.service="data-catalog"
ms.devlang="NA"
ms.topic="get-started-article"
ms.topic="article"
ms.tgt_pltfrm="NA"
ms.workload="data-catalog"
ms.date="03/31/2016"
Expand Down
4 changes: 2 additions & 2 deletions articles/data-catalog/data-catalog-how-to-data-profile.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
<properties
<properties
pageTitle="How to Data profile data sources"
description="How-to article highlighting how to include table- and column-level data profiles when registering data sources in Azure Data Catalog, and how to use data profiles to understand data sources."
services="data-catalog"
Expand All @@ -10,7 +10,7 @@
<tags
ms.service="data-catalog"
ms.devlang="NA"
ms.topic="get-started-article"
ms.topic="article"
ms.tgt_pltfrm="NA"
ms.workload="data-catalog"
ms.date="04/07/2016"
Expand Down
4 changes: 2 additions & 2 deletions articles/data-catalog/data-catalog-how-to-discover.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
<properties
<properties
pageTitle="How to discover data sources | Microsoft Azure"
description="How-to article highlighting how to discover registered data assets with Azure Data Catalog, including searching and filtering and using the hit highlighting capabilities of the Azure Data Catalog portal."
services="data-catalog"
Expand All @@ -10,7 +10,7 @@
<tags
ms.service="data-catalog"
ms.devlang="NA"
ms.topic="get-started-article"
ms.topic="article"
ms.tgt_pltfrm="NA"
ms.workload="data-catalog"
ms.date="03/31/2016"
Expand Down
2 changes: 1 addition & 1 deletion articles/data-catalog/data-catalog-how-to-documentation.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
<tags
ms.service="data-catalog"
ms.devlang="NA"
ms.topic="get-started-article"
ms.topic="article"
ms.tgt_pltfrm="NA"
ms.workload="data-catalog"
ms.date="04/07/2016"
Expand Down
4 changes: 2 additions & 2 deletions articles/data-catalog/data-catalog-how-to-manage.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
<properties
<properties
pageTitle="How to Manage Data Assets | Microsoft Azure"
description="How-to article highlighting how to control visibility and ownership of data assets registered in Azure Data Catalog."
services="data-catalog"
Expand All @@ -10,7 +10,7 @@
<tags
ms.service="data-catalog"
ms.devlang="NA"
ms.topic="get-started-article"
ms.topic="article"
ms.tgt_pltfrm="NA"
ms.workload="data-catalog"
ms.date="03/31/2016"
Expand Down
4 changes: 2 additions & 2 deletions articles/data-catalog/data-catalog-how-to-register.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
<properties
<properties
pageTitle="How to register data sources | Microsoft Azure"
description="How-to article highlighting how to register data sources with Azure Data Catalog, including the metadata fields extracted during registration."
services="data-catalog"
Expand All @@ -10,7 +10,7 @@
<tags
ms.service="data-catalog"
ms.devlang="NA"
ms.topic="get-started-article"
ms.topic="article"
ms.tgt_pltfrm="NA"
ms.workload="data-catalog"
ms.date="03/31/2016"
Expand Down
2 changes: 1 addition & 1 deletion articles/data-catalog/data-catalog-how-to-save-pin.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
<tags
ms.service="data-catalog"
ms.devlang="NA"
ms.topic="get-started-article"
ms.topic="article"
ms.tgt_pltfrm="NA"
ms.workload="data-catalog"
ms.date="03/30/2016"
Expand Down
35 changes: 19 additions & 16 deletions articles/data-factory/data-factory-data-processing-using-batch.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,21 +62,17 @@ The solution counts the number of occurrences of a search term (“Microsoft”)
The sample solution uses Azure Batch (indirectly via an Azure Data Factory pipeline) to process data in a parallel manner on a pool of compute nodes, which is a managed collection of virtual machines.

4. Create an **Azure Batch pool** with at least 2 compute nodes.

You can download the source code for the [Azure Batch Explorer tool](https://github.com/Azure/azure-batch-samples/tree/master/CSharp/BatchExplorer), compile, and use it to create the pool (**highly recommended for this sample solution**), or use [Azure Batch Library for .NET](../batch/batch-dotnet-get-started.md) to create the pool. See [Azure Batch Explorer Sample Walkthrough](http://blogs.technet.com/b/windowshpc/archive/2015/01/20/azure-batch-explorer-sample-walkthrough.aspx) for step-by-step instructions for using the Azure Batch Explorer. You can also use the [New-AzureRmBatchPool](https://msdn.microsoft.com/library/mt628690.aspx) cmdlet to create an Azure Batch pool.

Use Batch Explorer to create the pool with the following setting:

- Enter an ID for the pool (**Pool ID**). Note the **ID of the pool**; you will need it when creating the Data Factory solution.

- Specify **Windows Server 2012 R2** for the **Operating System Family** setting.

- Specify **2** as value for the **Max tasks per compute node** setting.

- Specify **2** as value for the **Number of Target Dedicated** setting.

![](./media/data-factory-data-processing-using-batch/image2.png)

1. In the [Azure Portal](https://portal.azure.com), click **Browse** in the left menu, and click **Batch Accounts**.
2. Select your Azure Batch account to open the **Batch Account** blade.
3. Click **Pools** tile.
4. In the **Pools** blade, click Add button on the toolbar to add a pool.
1. Enter an ID for the pool (**Pool ID**). Note the **ID of the pool**; you will need it when creating the Data Factory solution.
2. Specify **Windows Server 2012 R2** for the Operating System Family setting.
3. Select a **node pricing tier**.
3. Enter **2** as value for the **Target Dedicated** setting.
4. Enter **2** as value for the **Max tasks per node** setting.
5. Click **OK** to create the pool.

5. [Azure Storage Explorer 6 (tool)](https://azurestorageexplorer.codeplex.com/) or [CloudXplorer](http://clumsyleaf.com/products/cloudxplorer) (from ClumsyLeaf Software). These are GUI tools for inspecting and altering the data in your Azure Storage projects including the logs of your cloud-hosted applications.

1. Create a container named **mycontainer** with private access (no anonymous access)
Expand Down Expand Up @@ -808,6 +804,8 @@ In this step, you will test the pipeline by dropping files into the input folder

![](./media/data-factory-data-processing-using-batch/image14.png)

> [AZURE.NOTE] Download the source code for [Azure Batch Explorer tool][batch-explorer], compile, and use it to create and monitor Batch pools. See [Azure Batch Explorer Sample Walkthrough][batch-explorer-walkthrough] for step-by-step instructions for using the Azure Batch Explorer.
7. You should see the output files in the **outputfolder** of **mycontainer** in your Azure blob storage.

![](./media/data-factory-data-processing-using-batch/image15.png)
Expand Down Expand Up @@ -899,7 +897,7 @@ You can extend this sample to learn more about Azure Data Factory and Azure Batc

See [Automatically scale compute nodes in an Azure Batch pool](../batch/batch-automatic-scaling.md) for details.

The Azure Batch service could take 15-30 minutes to prepare the VM before running the custom activity on the VM.
If the pool is using the default [autoScaleEvaluationInterval](https://msdn.microsoft.com/library/azure/dn820173.aspx), the Batch service could take 15-30 minutes to prepare the VM before running the custom activity. If the pool is using a different autoScaleEvaluationInterval, the Batch service could take autoScaleEvaluationInterval + 10 minutes.

5. In the sample solution, the **Execute** method invokes the **Calculate** method that processes an input data slice to produce an output data slice. You can write your own method to process input data and replace the Calculate method call in the Execute method with a call to your method.

Expand Down Expand Up @@ -937,3 +935,8 @@ After you process data you can consume it with online tools like **Microsoft Pow
- [Create and manage Azure Batch account in the Azure Portal](../batch/batch-account-create-portal.md)

- [Get started with Azure Batch Library .NET](../batch/batch-dotnet-get-started.md)


[batch-explorer]: https://github.com/Azure/azure-batch-samples/tree/master/CSharp/BatchExplorer
[batch-explorer-walkthrough]: http://blogs.technet.com/b/windowshpc/archive/2015/01/20/azure-batch-explorer-sample-walkthrough.aspx

28 changes: 17 additions & 11 deletions articles/data-factory/data-factory-use-custom-activities.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,16 +47,19 @@ For the purpose of the tutorial, you need to create an Azure Batch account with
1. Create an **Azure Batch account** using the [Azure Portal](http://manage.windowsazure.com). See [Create and manage an Azure Batch account][batch-create-account] article for instructions. Note down the Azure Batch account name and account key.

You can also use [New-AzureBatchAccount][new-azure-batch-account] cmdlet to create an Azure Batch account. See [Using Azure PowerShell to Manage Azure Batch Account][azure-batch-blog] for detailed instructions on using this cmdlet.
2. Create an **Azure Batch pool**. You can download the source code for the [Azure Batch Explorer tool][batch-explorer], compile, and use it (or) use [Azure Batch Library for .NET][batch-net-library] to create a Azure Batch pool. See [Azure Batch Explorer Sample Walkthrough][batch-explorer-walkthrough] for step-by-step instructions for using the Azure Batch Explorer.

You can also use [New-AzureBatchPool](https://msdn.microsoft.com/library/mt628690.aspx) cmdlet to create an Azure Batch pool.

You may want to create the Azure Batch pool with at least 2 compute nodes so that slices are processed in parallel. If you are using Batch Explorer:

- Enter an ID for the pool (**Pool ID**). Note the **ID of the pool**; you will need it when creating the Data Factory solution.
- Specify **Windows Server 2012 R2** for the Operating System Family setting.
- Specify **2** as value for the **Max tasks per compute node** setting.
- Specify **2** as value for the **Number of Target Dedicated** setting.
2. Create an **Azure Batch pool**.
1. In the [Azure Portal](https://portal.azure.com), click **Browse** in the left menu, and click **Batch Accounts**.
2. Select your Azure Batch account to open the **Batch Account** blade.
3. Click **Pools** tile.
4. In the **Pools** blade, click Add button on the toolbar to add a pool.
1. Enter an ID for the pool (**Pool ID**). Note the **ID of the pool**; you will need it when creating the Data Factory solution.
2. Specify **Windows Server 2012 R2** for the Operating System Family setting.
3. Select a **node pricing tier**.
3. Enter **2** as value for the **Target Dedicated** setting.
4. Enter **2** as value for the **Max tasks per node** setting.
5. Click **OK** to create the pool.

You can also use [New-AzureBatchPool](https://msdn.microsoft.com/library/mt628690.aspx) cmdlet to create an Azure Batch pool.

### High-level steps
1. **Create a custom activity** to use a Data Factory pipeline. The custom activity in this sample will contain the data transformation/processing logic.
Expand Down Expand Up @@ -653,9 +656,12 @@ In this step, you will create datasets to represent input and output data.
See [Monitor and Manage Pipelines](data-factory-monitor-manage-pipelines.md) for detailed steps for monitoring datasets and pipelines.

The Data Factory service creates a job in Azure Batch with the name: **adf-<pool name>:job-xxx**. A task is created for each activity run of a slice. If there are 10 slices ready to be processed, 10 tasks are created in this job. You can have more than one slice running in parallel if you have multiple compute nodes in the pool. You can also have more than one slice running on the same compute if the maximum tasks per compute node is set to > 1.


![Batch Explorer tasks](./media/data-factory-use-custom-activities/BatchExplorerTasks.png)

> [AZURE.NOTE] Download the source code for [Azure Batch Explorer tool][batch-explorer], compile, and use it to create and monitor Batch pools. See [Azure Batch Explorer Sample Walkthrough][batch-explorer-walkthrough] for step-by-step instructions for using the Azure Batch Explorer.
![Data Factory & Batch](./media/data-factory-use-custom-activities/DataFactoryAndBatch.png)

You can see the Azure Batch tasks associated with processing the slices in the Azure Batch Explorer as shown in the following diagram.
Expand Down Expand Up @@ -731,7 +737,7 @@ You can also create an Azure Batch pool with **autoscale** feature. For example,

See [Automatically scale compute nodes in an Azure Batch pool](../batch/batch-automatic-scaling.md) for details.

The Batch service could take 15-30 minutes to prepare the VM before running the custom activity on the VM.
If the pool is using the default [autoScaleEvaluationInterval](https://msdn.microsoft.com/library/azure/dn820173.aspx), the Batch service could take 15-30 minutes to prepare the VM before running the custom activity. If the pool is using a different autoScaleEvaluationInterval, the Batch service could take autoScaleEvaluationInterval + 10 minutes.

## Use Azure HDInsight linked services
In the walkthrough, you used Azure Batch compute to run the custom activity. You can also use your own HDInsight cluster or have Data Factory create an on-demand HDInsight cluster and have the custom activity run on the HDInsight cluster. Here are the high level steps for using an HDInsight cluster.
Expand Down
Loading

0 comments on commit 0289845

Please sign in to comment.