title | description | services | documentationcenter | tags | author | manager | editor | ms.assetid | ms.service | ms.workload | ms.tgt_pltfrm | ms.devlang | ms.topic | ms.date | ms.author | ROBOTS |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Use R in HDInsight to customize clusters - Azure | Microsoft Docs |
Learn how to install R using Script Action, and use R on HDInsight clusters. |
hdinsight |
azure-portal |
mumian |
jhubbard |
cgronlun |
be851270-afa5-4af0-a69e-2d343a4deeb7 |
hdinsight |
big-data |
na |
na |
article |
05/25/2017 |
jgao |
NOINDEX |
Learn how to customize Windows based HDInsight cluster with R using Script Action, and how to use R on HDInsight clusters. The HDInsight offering includes R Server as part of your HDInsight cluster. This allows R scripts to use MapReduce and Spark to run distributed computations. For more information, see Get started using R Server on HDInsight. For information on using R with a Linux-based cluster, see Install and use R on HDinsight Hadoop clusters (Linux).
You can install R on any type of cluster (Hadoop, Storm, HBase, Spark) on Azure HDInsight by using Script Action. A sample script to install R on an HDInsight cluster is available from a read-only Azure storage blob at https://hdiconfigactions.blob.core.windows.net/rconfigactionv02/r-installer-v02.ps1.
Related articles
- Install and use R on HDinsight Hadoop clusters (Linux)
- Create Hadoop clusters in HDInsight: general information on creating HDInsight clusters
- Customize HDInsight cluster using Script Action: general information on customizing HDInsight clusters using Script Action
- Develop Script Action scripts for HDInsight
The R Project for Statistical Computing is an open source language and environment for statistical computing. R provides hundreds of build-in statistical functions and its own programming language that combines aspects of functional and object-oriented programming. It also provides extensive graphical capabilities. R is the preferred programming environment for most professional statisticians and scientists in a wide variety of fields.
R is compatible with Azure Blob Storage (WASB) so that data that is stored there can be processed using R on HDInsight.
A sample script to install R on an HDInsight cluster is available from a read-only blob in Azure Storage. This section provides instructions about how to use the sample script while creating the cluster using the Azure Portal.
Note
The sample script was introduced with HDInsight cluster version 3.1. For more information about HDInsight cluster versions, see HDInsight cluster versions.
-
When you create an HDInsight cluster from the Portal, click Optional Configuration, and then click Script Actions.
-
On the Script Actions page, enter the following values:
Property Value Name Specify a name for the script action, for example, Install R. Script URI Specify the URI to the script that is invoked to customize the cluster, for example, https://hdiconfigactions.blob.core.windows.net/rconfigactionv02/r-installer-v02.ps1 Node Type Specify the nodes on which the customization script is run. You can choose All Nodes, Head nodes only, or Worker nodes only. Parameters Specify the parameters, if required by the script. However, the script to install R does not require any parameters, so you can leave this blank. You can add more than one script action to install multiple components on the cluster. After you have added the scripts, click the check mark to start crating the cluster.
You can also use the script to install R on HDInsight by using Azure PowerShell or the HDInsight .NET SDK. Instructions for these procedures are provided later in this article.
This section describes how to run an R script on the Hadoop cluster with HDInsight.
-
Establish a Remote Desktop connection to the cluster: From the Portal, enable Remote Desktop for the cluster you created with R installed, and then connect to the cluster. For instructions, see Connect to HDInsight clusters using RDP.
-
Open the R console: The R installation puts a link to the R console on the desktop of the head node. Click on it to open the R console.
-
Run the R script: The R script can be run directly from the R console by pasting it, selecting it, and pressing ENTER. Here is a simple example script that generates the numbers 1 to 100 and then multiplies them by 2.
library(rmr2) library(rhdfs) ints = to.dfs(1:100) calc = mapreduce(input = ints, map = function(k, v) cbind(v, 2*v)) from.dfs(calc)
The first two lines call the RHadoop libraries that are installed with R. The final line prints the results to the console. The output should look like this:
[1,] 1 2
[2,] 2 4
.
.
.
[98,] 98 196
[99,] 99 198
[100,] 100 200
See Customize HDInsight clusters using Script Action. The sample demonstrates how to install Spark using Azure PowerShell. You need to customize the script to use https://hdiconfigactions.blob.core.windows.net/rconfigactionv02/r-installer-v02.ps1.
See Customize HDInsight clusters using Script Action. The sample demonstrates how to install Spark using the .NET SDK. You need to customize the script to use https://hdiconfigactions.blob.core.windows.net/rconfigactionv02/r-installer-v02.ps1.
- Install and use R on HDinsight Hadoop clusters (Linux)
- Create Hadoop clusters in HDInsight: general information on creating HDInsight clusters
- Customize HDInsight cluster using Script Action: general information on customizing HDInsight clusters using Script Action
- Develop Script Action scripts for HDInsight
- Install and use Spark on HDInsight clusters: Script Action sample about installing Spark
- Install Giraph on HDInsight clusters: Script Action sample about installing Giraph
- Install Solr on HDInsight clusters: Script Action sample about installing Solr.