fix merge issue

cjgronlund · cjgronlund · commit b37e22c6f84f · 2015-03-10T10:09:53.000-07:00
diff --git a/articles/hdinsight-connect-excel-hive-ODBC-driver.md b/articles/hdinsight-connect-excel-hive-ODBC-driver.md
@@ -19,12 +19,6 @@ Before you begin this article, you must have the following:
 - A computer that is running Windows 8, Windows 7, Windows Server 2012, or Windows Server 2008 R2.
 - Office 2013 Professional Plus, Office 365 Pro Plus, Excel 2013 Standalone, or Office 2010 Professional Plus.
 
-##In this article
-
-1. [Install the Microsoft Hive ODBC Driver](#InstallHiveODBCDriver)
-2. [Create a Hive ODBC Data Source](#CreateHiveODBCDataSource)
-3. [Import data into Excel from an HDInsight cluster](#ImportData)
-4. [Next steps](#nextsteps)
 
 ##<a id="InstallHiveODBCDriver"></a>Install the Microsoft Hive ODBC Driver
 
diff --git a/articles/hdinsight-connect-excel-power-query.md b/articles/hdinsight-connect-excel-power-query.md
@@ -17,8 +17,6 @@
 	ms.author="bradsev"/>
 
 
-
-
 #Connect Excel to Hadoop with Power Query
 
 One key feature of Microsoft's big data solution is the integration of  Microsoft Business Intelligence (BI) components with Hadoop clusters in HDInsight. A primary example of this integration is the ability to connect Excel to the Azure storage account containing the data associated with your Hadoop cluster by using Microsoft Power Query for Excel. This article walks you through how to set up and use Power Query from Excel to query data associated with an Hadoop cluster managed with HDInsight. 
@@ -31,12 +29,6 @@ Before you begin this article, you must have the following:
 - A computer that is running Windows 7, Windows Server 2008 R2, or above.
 - Office 2013 Professional Plus, Office 365 Pro Plus, Excel 2013 Standalone, or Office 2010 Professional Plus.
 
-## In this article
-
-- [Install Microsoft Power Query for Excel](#InstallPowerQuery)
-- [Import data into Excel](#ImportData)
-- [Next steps](#NextSteps)
-
 
 ## <a id="InstallPowerQuery"></a>Install Microsoft Power Query for Excel
 
diff --git a/articles/hdinsight-dotnet-avro-serialization.md b/articles/hdinsight-dotnet-avro-serialization.md
@@ -22,17 +22,6 @@
 ##Overview
 This topic shows how to use the <a href="https://hadoopsdk.codeplex.com/wikipage?title=Avro%20Library" target="_blank">Microsoft Avro Library</a> to serialize objects and other data structures into streams in order to persist them to memory, a database or a file, and also how to deserialize them to recover the original objects. 
 
-## In this article
-
-- [Apache Avro](#apacheAvro)
-- [The Hadoop scenario](#hadoopScenario)
-- [Serialization in the Microsoft Avro Library](#serializationMAL) 
-- [Microsoft Avro Library prerequisites](#prerequisites)
-- [Microsoft Avro Library installation](#installation)
-- [Microsoft Avro Library source code](#sourceCode)
-- [Compiling the Schema with the Microsoft Avro Library](#compiling)
-- [Guide to the samples for the Microsoft Avro Library](#samples)
-
 
 ##<a name="apacheAvro"></a>Apache Avro
 The <a href="https://hadoopsdk.codeplex.com/wikipage?title=Avro%20Library" target="_blank">Microsoft Avro Library</a> implements the Apache Avro data serialization system for the Microsoft.NET environment. Apache Avro provides a compact binary data interchange format for serialization. It uses <a href="http://www.json.org" target="_blank">JSON</a> to define language agnostic schema that underwrites language interoperability. Data serialized in one language can be read in another. Currently C, C++, C#, Java, PHP, Python, and Ruby are supported. Detailed information on the format can be found in the <a href="http://avro.apache.org/docs/current/spec.html" target="_blank">Apache Avro Specification</a>. Note that the current version of the Microsoft Avro Library does not support the Remote Procedure Calls (RPC) part of this specification.
diff --git a/articles/hdinsight-hadoop-access-yarn-app-logs.md b/articles/hdinsight-hadoop-access-yarn-app-logs.md
@@ -34,12 +34,6 @@ To install the HDInsight SDK from a Visual Studio application, go the **Tools**
 
 This command adds .NET libraries for HDInsight and references to them to the current Visual Studio project.
 
-## In this article
-
-- [YARN Timeline Server](#YARNTimelineServer)
-- [YARN Applications and Logs](#YARNAppsAndLogs)
-- [Enumerating Applications and Downloading Logs Programmatically](#enumerate-and-download)
-
 
 ## <a name="YARNTimelineServer"></a>YARN Timeline Server
 
diff --git a/articles/hdinsight-hadoop-collect-debug-heap-dumps.md b/articles/hdinsight-hadoop-collect-debug-heap-dumps.md
@@ -22,13 +22,6 @@ Heap dumps can be automatically collected for Hadoop services and placed inside
 
 The collection of heap dumps for various services must be enabled for services on individual clusters. The default for this feature is to be off for a cluster. These heap dumps can be large in size so it is advisable to monitor the blob storage account where they are being saved once the collection has been enabled.
 
-## In this article
-
-- [For which services can heap dumps be enabled?](#whichServices)
-- [The configuration elements that enable heap dumps](#configuration)
-- [How to enable heap dumps with Azure HDInsight PowerShell](#powershell)
-- [How to enable heap dumps with HDInsight .NET SDK](#sdk)
-
 
 ## <a name="whichServices"></a>For which services can heap dumps be enabled?
 
diff --git a/articles/hdinsight-hadoop-r-scripts.md b/articles/hdinsight-hadoop-r-scripts.md
@@ -23,15 +23,6 @@ You can install R on any type of cluster in Hadoop on HDInsight using **Script A
 Script action lets you run scripts to customize a cluster, only when the cluster is being created. For more information, see [Customize HDInsight cluster using script action][hdinsight-cluster-customize].
 
 
-## In this article
-
-- [What is R?](#whatIs)
-- [How do I install R?](#install)
-- [How do I run R scripts in HDInsight](#useR) 
-- [Install R on HDInsight Hadoop clusters using PowerShell](#usingPS)
-- [Install R on HDInsight Hadoop clusters using the .NET SDK](#usingSDK)
-- [See also](#seeAlso)
-
 
 ## <a name="whatIs"></a>What is R?
 
diff --git a/articles/hdinsight-hadoop-script-actions.md b/articles/hdinsight-hadoop-script-actions.md
@@ -23,17 +23,6 @@ Script Actions provide Azure HDInsight functionality that is used to install add
 Script Action can be deployed from Azure PowerShell or by using the HDInsight .NET SDK.  For more information, see [Customize HDInsight cluster using Script Actions][hdinsight-cluster-customize].
 
 
-## In this article
-
-- [Best practices for script development](#bestPracticeScripting)
-- [Helper methods for custom scripts](#helpermethods)
-- [Checklist for deploying a Script Action](#deployScript)
-- [How to run a Script Action](#runScriptAction)
-- [Custom script samples](#sampleScripts) 
-- [How to test your custom script with the HDInsight Emulator](#testScript)
-- [How to debug your custom script](#debugScript)
-- [See also](#seeAlso)
-
 
 ## <a name="bestPracticeScripting"></a>Best practices for script development
 
diff --git a/articles/hdinsight-sample-10gb-graysort.md b/articles/hdinsight-sample-10gb-graysort.md
@@ -44,13 +44,6 @@ The input and output format, used by all three applications, read and write the
 
 - You must have installed Azure PowerShell, and have configured them for use with your account. For instructions on how to do this, see [Install and configure Azure PowerShell][powershell-install-configure].
 
-##In this article
-This topic shows you how to run the series of MapReduce programs that make up the Sample, presents the Java code for the MapReduce program, summarizes what you have learned, and outlines some next steps. It has the following sections.
-	
-1. [Run the sample with Azure PowerShell](#run-sample)	
-2. [The Java code for the TeraSort MapReduce program](#java-code)
-3. [Summary](#summary)	
-4. [Next steps](#next-steps)	
 
 <h2><a id="run-sample"></a>Run the sample with Azure PowerShell</h2>
 
diff --git a/articles/hdinsight-sample-csharp-streaming.md b/articles/hdinsight-sample-csharp-streaming.md
@@ -43,15 +43,7 @@ For more information on the Hadoop streaming interface, see [Hadoop Streaming][h
 - You must have provisioned an HDInsight cluster. For instructions on the various ways in which such clusters can be created, see [Provision HDInsight Clusters](../hdinsight-provision-clusters/)
 
 - You must have installed Azure PowerShell, and have configured them for use with your account. For instructions on how to do this, see [Install and configure Azure PowerShell][powershell-install-configure].
-
-
-##In this article
-This topic shows you how to run the sample, presents the Java code for the MapReduce program, summarizes what you have learned, and outlines some next steps. It has the following sections.
 	
-1. [Run the sample with Azure PowerShell](#run-sample)	
-2. [The C# code for Hadoop Streaming](#java-code)
-3. [Summary](#summary)	
-4. [Next steps](#next-steps)	
 
 <h2><a id="run-sample"></a>Run the sample with Azure PowerShell</h2>
 
diff --git a/articles/hdinsight-sample-pi-estimator.md b/articles/hdinsight-sample-pi-estimator.md
@@ -42,18 +42,11 @@ The other samples that are available to help you get up to speed in using HDInsi
 - You must have provisioned an HDInsight cluster. For instructions on the various ways in which such clusters can be created, see [Provision HDInsight Clusters](../hdinsight-provision-clusters/).
 
 - You must have installed Azure PowerShell, and have configured it for use with your account. For instructions on how to do this, see [Install and configure Azure PowerShell][powershell-install-configure].
-
-##In this article	
-This topic shows you how to run the sample, presents the Java code for the pi estimator MapReduce program, summarizes what you have learned, and outlines some next steps. It has the following sections:
-	
-1. [Run the sample with Azure PowerShell](#run-sample)	
-2. [The Java code for the pi estimator MapReduce program](#java-code)
-3. [Summary](#summary)	
-4. [Next steps](#next-steps)	
+
 
 <h2><a id="run-sample"></a>Run the sample with Azure PowerShell</h2>
 
-**To submit the MapReduce job**
+**To submit the MapReduce job**s
 
 1. Open Azure PowerShell. For instructions on how to use the Azure PowerShell console window, see [Install and configure Azure PowerShell][powershell-install-configure].
 2. Set the two variables in the following commands, and then run them:
diff --git a/articles/hdinsight-sample-wordcount.md b/articles/hdinsight-sample-wordcount.md
@@ -35,13 +35,6 @@ This tutorial shows you how to run a MapReduce word count example on an Hadoop c
 
 - You must have installed Azure PowerShell, and have configured them for use with your account. For instructions on how to do this, see [Install and configure Azure PowerShell][powershell-install-configure]
 
-##In this article	
-This topic shows you how to run the sample, presents the Java code for the MapReduce program, summarizes what you have learned, and outlines some next steps. It has the following sections.
-	
-1. [Run the sample using Azure PowerShell](#run-sample)	
-2. [The Java code for the WordCount MapReduce program](#java-code)
-3. [Summary](#summary)	
-4. [Next steps](#next-steps)	
 
 <h2><a id="run-sample"></a>Run the sample using Azure PowerShell</h2> 
 
diff --git a/articles/machine-learning-consume-web-services.md b/articles/machine-learning-consume-web-services.md
@@ -1,6 +1,6 @@
 <properties 
 	pageTitle="How to consume a Machine Learning web service that has been published from a Machine Learning experiment | Azure" 
-	description="required" 
+	description="Once a machine learning service is published, the RESTFul web service that is made available can be consumed either as request-response service or as a batch execution service." 
 	services="machine-learning" 
 	solutions="big-data" 
 	documentationCenter="" 
diff --git a/articles/machine-learning-feature-selection-and-engineering.md b/articles/machine-learning-feature-selection-and-engineering.md
@@ -21,7 +21,7 @@
 
 This topic explains the purposes of feature engineering and feature selection in the data enhancement process of machine learning. It illustrates what these processes involve using examples provided by Azure Machine Learning Studio.
 
-The training of models used in machine learning can often be enhanced by the selection or extraction of features from the raw data collected. A  example of an engineered feature in the context of learning how to classify the images of handwritten characters is a bit density map constructed from the raw bit distribution data. This map can help locate the edges of the characters more efficiently than the raw distribution.
+The training data used in machine learning can often be enhanced by the selection or extraction of features from the raw data collected. A  example of an engineered feature in the context of learning how to classify the images of handwritten characters is a bit density map constructed from the raw bit distribution data. This map can help locate the edges of the characters more efficiently than the raw distribution.
 
 Engineered and selected features increase the efficiency of the training process which attempts to extract the key information contained in the data. They also improve the power of these models to classify the input data accurately and to predict outcomes of interest more robustly. Feature engineering and selection can also combine to make the learning more computationally tractable. It does so by enhancing and then reducing the number of features needed to calibrate or train a model. Mathematically speaking, the features selected to train the model are a minimal set of independent variables that explain the patterns in the data and then predict outcomes successfully. 
 
@@ -65,7 +65,7 @@ With the goal of constructing effective features in the training data, four regr
 
 Besides feature set A, which already exist in the original raw data, the other three sets of features are created through the feature engineering process. Feature set B captures very recent demand for the bikes. Feature set C captures the demand for bikes at a particular hour. Feature set D captures demand for bikes at particular hour and particular day of the week. The four training datasets each includes feature set A, A+B, A+B+C, and A+B+C+D, respectively.
 
-In the Azure Machine Learning experiment, these four training datasets are formed via four branches from the pre-processed input dataset. Except the left most branch, each of these branches contains an "Execute R Script" module, in which a set of derived features (feature set B, C, and D) are respectively constructed and appended to the imported dataset. The following figure demonstrates the R script used to create feature set B in the second branch from the left.
+In the Azure Machine Learning experiment, these four training datasets are formed via four branches from the pre-processed input dataset. Except the left most branch, each of these branches contains an "Execute R Script" module, in which a set of derived features (feature set B, C, and D) are respectively constructed and appended to the imported dataset. The following figure demonstrates the R script used to create feature set B in the second left branch.
 
 ![create features](./media/machine-learning-feature-selection-and-engineering/addFeature-Rscripts.png) 
 
@@ -79,7 +79,7 @@ Feature engineering is widely applied in tasks related to text mining, such as d
 
 To achieve this task, a technique called **feature hashing** is applied to efficiently turn arbitrary text features into indices. Instead of associating each text feature (words/phrases) to a particular index, this method functions by applying a hash function to the features and using their hash values as indices directly.
 
-In Azure Machine Learning, there is a [Feature Hashing](http://msdn.microsoft.com/library/azure/c9a82660-2d9c-411d-8122-4d9e0b3ce92a) module that creates these word/phrase features conveniently. Following figure shows an example of using this module. The input dataset contains two columns: the book rating ranging from 1 to 5, and the actually review content. The goal of this "Feature Hashing" module is to retrieve a bunch of new features that show the occurrence frequency of the corresponding word(s)/phrase(s) within the particular book review. To use this module, we need to complete the following steps:
+In Azure Machine Learning, there is a [Feature Hashing](http://msdn.microsoft.com/library/azure/c9a82660-2d9c-411d-8122-4d9e0b3ce92a) module that creates these word/phrase features conveniently. Following figure shows an example of using this module. The input dataset contains two columns: the book rating ranging from 1 to 5, and the actual review content. The goal of this "Feature Hashing" module is to retrieve a bunch of new features that show the occurrence frequency of the corresponding word(s)/phrase(s) within the particular book review. To use this module, we need to complete the following steps:
 
 * First, select the column that contains the input text ("Col2" in this example). 
 * Second, set the "Hashing bitsize" to 8, which means 2^8=256 features will be created. The word/phase in all the text will be hashed to 256 indices. The parameter "Hashing bitsize" ranges from 1 to 31. The word(s)/phrase(s) are less likely to be hashed into the same index if setting it to be a larger number. 
@@ -98,16 +98,15 @@ Feature selection is a process that is commonly applied for the construction of
 * First, feature selection often increases classification accuracy by eliminating irrelevant, redundant, or highly correlated features. 
 * Second, it decreases the number of features which makes model training process more efficient. This is particularly important for learners that are expensive to train such as support vector machines. 
 
-Although feature selection does seek to reduce the number of features in the dataset used to train the model, it is not usually referred to by the term "dimensionality reduction". Feature selection methods extract a subset of original features in the data without changing them.  Dimensionality reduction methods employ engineered features that can transform the original features and thus modify them.
-
-Examples of dimensionality reduction methods include Principal Component Analysis, canonical correlation analysis, and Singular Value Decomposition.
+Although feature selection does seek to reduce the number of features in the dataset used to train the model, it is not usually referred to by the term "dimensionality reduction". Feature selection methods extract a subset of original features in the data without changing them.  Dimensionality reduction methods employ engineered features that can transform the original features and thus modify them. Examples of dimensionality reduction methods include Principal Component Analysis, canonical correlation analysis, and Singular Value Decomposition.
 
 Among others, one widely applied category of feature selection methods in a supervised context is called "filter based feature selection". By evaluating the correlation between each feature and the target attribute, these methods apply a statistical measure to assign a score to each feature. The features are then ranked by the score, which may be used to help set the threshold for keeping or eliminating a specific feature. Examples of the statistical measures used in these methods include Person correlation, mutual information, and the Chi squared test.
 
-In Azure Machine Learning Studio, there are modules provided for feature selection. As shown in the following figure, these modules include "Filter Based Feature Selection", "Fisher Liner Discriminant Analysis", and "Linear Discriminant Analysis".
+In Azure Machine Learning Studio, there are modules provided for feature selection. As shown in the following figure, these modules include "Filter Based Feature Selection"and "Fisher Liner Discriminant Analysis".
  
 ![Feature selection example](./media/machine-learning-feature-selection-and-engineering/feature-Selection.png)
 
+
 Consider, for example, the use of the [Filter Based Feature Selection](http://help.azureml.net/Content/html/818b356b-045c-412b-aa12-94a1d2dad90f.htm) module. For the purpose of convenience, we continue to use the text mining example outlined above. Assume that we want to build a regression model after a set of 256 features are created through the "Feature Hashing" module, and that the response variable is the "Col1" and represents a book review ratings ranging from 1 to 5. By setting "Feature scoring method" to be "Pearson Correlation", the "Target column" to be "Col1", and the "Number of desired features" to 50. Then the module "Filter Based Feature Selection" will produce a dataset containing 50 features together with the target attribute "Col1". The following figure shows the flow of this experiment and the input parameters we just described.
 
 ![Feature selection example](./media/machine-learning-feature-selection-and-engineering/feature-Selection1.png)
diff --git a/articles/media/machine-learning-feature-selection-and-engineering/feature-Selection.png b/articles/media/machine-learning-feature-selection-and-engineering/feature-Selection.png