You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: articles/hdinsight/hdinsight-dotnet-avro-serialization.md
+8-7
Original file line number
Diff line number
Diff line change
@@ -14,7 +14,7 @@
14
14
ms.tgt_pltfrm="na"
15
15
ms.devlang="na"
16
16
ms.topic="article"
17
-
ms.date="10/29/2015"
17
+
ms.date="02/04/2015"
18
18
ms.author="jgao"/>
19
19
20
20
@@ -32,7 +32,7 @@ The serialized representation of an object in the Avro system consists of two pa
32
32
##The Hadoop scenario
33
33
The Apache Avro serialization format is widely used in Azure HDInsight and other Apache Hadoop environments. Avro provides a convenient way to represent complex data structures within a Hadoop MapReduce job. The format of Avro files (Avro object container file) has been designed to support the distributed MapReduce programming model. The key feature that enables the distribution is that the files are “splittable” in the sense that one can seek any point in a file and start reading from a particular block.
34
34
35
-
##Serialization in the Microsoft Avro Library
35
+
##Serialization in Avro Library
36
36
The .NET Library for Avro supports two ways of serializing objects:
37
37
38
38
-**reflection** - The JSON schema for the types is automatically built from the data contract attributes of the .NET types to be serialized.
@@ -41,14 +41,16 @@ The .NET Library for Avro supports two ways of serializing objects:
41
41
When the data schema is known to both the writer and reader of the stream, the data can be sent without its schema. In cases when an Avro object container file is used, the schema is stored within the file. Other parameters, such as the codec used for data compression, can be specified. These scenarios are outlined in more detail and illustrated in the code examples below.
42
42
43
43
44
-
## Microsoft Avro Library prerequisites
44
+
## Install Avro Library
45
+
46
+
The following are required before you install the libary:
- <ahref="http://james.newtonking.com/json"target="_blank">Newtonsoft Json.NET</a> (6.0.4 or later)
48
50
49
51
Note that the Newtonsoft.Json.dll dependency is downloaded automatically with the installation of the Microsoft Avro Library. The procedure for this is provided in the following section.
50
52
51
-
## Microsoft Avro Library installation
53
+
52
54
The Microsoft Avro Library is distributed as a NuGet package that can be installed from Visual Studio via the following procedure:
53
55
54
56
1. Select the **Project** tab -> **Manage NuGet Packages...**
@@ -59,11 +61,10 @@ Note that the Newtonsoft.Json.dll (>=6.0.4) dependency is also downloaded automa
59
61
60
62
You may want to visit the <ahref="https://hadoopsdk.codeplex.com/wikipage?title=Avro%20Library"target="_blank">Microsoft Avro Library home page</a> to read the current release notes.
61
63
62
-
##Microsoft Avro Library source code
63
64
64
65
The Microsoft Avro Library source code is available at the <ahref="https://hadoopsdk.codeplex.com/wikipage?title=Avro%20Library"target="_blank">Microsoft Avro Library home page</a>.
65
66
66
-
##Compiling the schema by using the Microsoft Avro Library
67
+
##Compile schemas using Avro Library
67
68
68
69
The Microsoft Avro Library contains a code generation utility that allows creating C# types automatically based on the previously defined JSON schema. The code generation utility is not distributed as a binary executable, but can be easily built via the following procedure:
69
70
@@ -90,7 +91,7 @@ Please note that namespaces are extracted from the JSON schema, using the logic
##<aname="samples"></a>Guide to the samples for the Microsoft Avro Library
94
+
##Samples
94
95
Six examples provided in this topic illustrate different scenarios supported by the Microsoft Avro Library. The Microsoft Avro Library is designed to work with any stream. In these examples, data is manipulated via memory streams rather than file streams or databases for simplicity and consistency. The approach taken in a production environment will depend on the exact scenario requirements, data source and volume, performance constraints, and other factors.
95
96
96
97
The first two examples show how to serialize and deserialize data into memory stream buffers by using reflection and generic records. The schema in these two cases is assumed to be shared between the readers and writers out-of-band.
Copy file name to clipboardexpand all lines: articles/hdinsight/hdinsight-hadoop-script-actions.md
+4-4
Original file line number
Diff line number
Diff line change
@@ -14,12 +14,12 @@
14
14
ms.tgt_pltfrm="na"
15
15
ms.devlang="na"
16
16
ms.topic="article"
17
-
ms.date="11/09/2015"
17
+
ms.date="02/04/2016"
18
18
ms.author="jgao"/>
19
19
20
20
# Develop Script Action scripts for HDInsight
21
21
22
-
Learn how to write Script Action scripts for HDInsight. For information on using Script Action scripts, see [Customize HDInsight clusters using Script Action](hdinsight-hadoop-customize-cluster.md). For the same article written for the HDInsight cluster on Linux operating system, see [Develop Script Action scripts for HDInsight](hdinsight-hadoop-script-actions-linux.md).
22
+
Learn how to write Script Action scripts for HDInsight. For information on using Script Action scripts, see [Customize HDInsight clusters using Script Action](hdinsight-hadoop-customize-cluster.md). For the same article written for Linux-based HDInsight clusters, see [Develop Script Action scripts for HDInsight](hdinsight-hadoop-script-actions-linux.md).
23
23
24
24
Script Action can be used to install additional software running on a Hadoop cluster or to change the configuration of applications installed on a cluster. Script actions are scripts that run on the cluster nodes when HDInsight clusters are deployed, and they are executed once nodes in the cluster complete HDInsight configuration. A script action is executed under system admin account privileges and provides full access rights to the cluster nodes. Each cluster can be provided with a list of script actions to be executed in the order in which they are specified.
25
25
@@ -30,7 +30,7 @@ Script Action can be used to install additional software running on a Hadoop clu
30
30
31
31
## Sample scripts
32
32
33
-
For provisioning HDInsight clusters on Windows operating system, the Script Action is Azure PowerShell script.The following is a sample script for configure the site configuration files:
33
+
For creating HDInsight clusters on Windows operating system, the Script Action is Azure PowerShell script.The following is a sample script for configure the site configuration files:
34
34
35
35
param (
36
36
[parameter(Mandatory)][string] $ConfigFileName,
@@ -214,7 +214,7 @@ or
214
214
215
215
### Throw exception for failed cluster deployment
216
216
217
-
If you want to get accurately notified of the fact that cluster customization did not succeed as expected, it is important to throw an exception and fail the cluster provisioning. For instance, you might want to process a file if it exists and handle the error case where the file does not exist. This would ensure that the script exits gracefully and the state of the cluster is correctly known. The following snippet gives an example of how to achieve this:
217
+
If you want to get accurately notified of the fact that cluster customization did not succeed as expected, it is important to throw an exception and fail the cluster creation. For instance, you might want to process a file if it exists and handle the error case where the file does not exist. This would ensure that the script exits gracefully and the state of the cluster is correctly known. The following snippet gives an example of how to achieve this:
Copy file name to clipboardexpand all lines: articles/hdinsight/hdinsight-high-availability.md
+5-5
Original file line number
Diff line number
Diff line change
@@ -14,7 +14,7 @@
14
14
ms.tgt_pltfrm="na"
15
15
ms.devlang="multiple"
16
16
ms.topic="article"
17
-
ms.date="10/29/2015"
17
+
ms.date="02/04/2016"
18
18
ms.author="jgao"/>
19
19
20
20
@@ -49,7 +49,7 @@ Standard implementations of Hadoop clusters typically have a single head node. H
49
49
50
50
51
51
52
-
## Check the active head node service status
52
+
## Check active head node service status
53
53
To determine which head node is active and to check on the status of the services running on that head node, you must connect to the Hadoop cluster by using the Remote Desktop Protocol (RDP). For the RDP instructions, see [Manage Hadoop clusters in HDInsight by using the Azure Portal](hdinsight-administer-use-management-portal.md#connect-to-hdinsight-clusters-by-using-rdp). Once you have remoted into the cluster, double-click on the **Hadoop Service Available ** icon located on the desktop to obtain status about which head node the Namenode, Jobtracker, Templeton, Oozieservice, Metastore, and Hiveserver2 services are running, or for HDI 3.0, the Namenode, Resource Manager, History Server, Templeton, Oozieservice, Metastore, and Hiveserver2 services.
The head nodes are allocated as large virtual machines (VMs) by default. This size is adequate for the management of most Hadoop jobs run on the cluster. But there are scenarios that may require extra-large VMs for the head nodes. One example is when the cluster has to manage a large number of small Oozie jobs.
68
68
69
69
Extra-large VMs can be configured by using either Azure PowerShell cmdlets or the HDInsight SDK.
@@ -100,8 +100,8 @@ For the SDK, the story is similar. The creation and provisioning of a cluster by
100
100
};
101
101
102
102
103
-
**References**
103
+
## Next Steps
104
104
105
-
-[ZooKeeper](http://zookeeper.apache.org/)
105
+
-[Apache ZooKeeper](http://zookeeper.apache.org/)
106
106
-[Connect to HDInsight clusters using RDP](hdinsight-administer-use-management-portal.md#rdp)
0 commit comments