Skip to content

Commit

Permalink
[SPARK-11407][SPARKR] Add doc for running from RStudio
Browse files Browse the repository at this point in the history
![image](https://cloud.githubusercontent.com/assets/8969467/10871746/612ba44a-80a4-11e5-99a0-40b9931dee52.png)
(This is without css, but you get the idea)
shivaram

Author: felixcheung <[email protected]>

Closes apache#9401 from felixcheung/rstudioprogrammingguide.
  • Loading branch information
felixcheung authored and shivaram committed Nov 3, 2015
1 parent ebf8b0b commit a9676cc
Showing 1 changed file with 43 additions and 3 deletions.
46 changes: 43 additions & 3 deletions docs/sparkr.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,24 +30,64 @@ The entry point into SparkR is the `SparkContext` which connects your R program
You can create a `SparkContext` using `sparkR.init` and pass in options such as the application name
, any spark packages depended on, etc. Further, to work with DataFrames we will need a `SQLContext`,
which can be created from the SparkContext. If you are working from the `sparkR` shell, the
`SQLContext` and `SparkContext` should already be created for you.
`SQLContext` and `SparkContext` should already be created for you, and you would not need to call
`sparkR.init`.

<div data-lang="r" markdown="1">
{% highlight r %}
sc <- sparkR.init()
sqlContext <- sparkRSQL.init(sc)
{% endhighlight %}
</div>

## Starting Up from RStudio

In the event you are creating `SparkContext` instead of using `sparkR` shell or `spark-submit`, you
You can also start SparkR from RStudio. You can connect your R program to a Spark cluster from
RStudio, R shell, Rscript or other R IDEs. To start, make sure SPARK_HOME is set in environment
(you can check [Sys.getenv](https://stat.ethz.ch/R-manual/R-devel/library/base/html/Sys.getenv.html)),
load the SparkR package, and call `sparkR.init` as below. In addition to calling `sparkR.init`, you
could also specify certain Spark driver properties. Normally these
[Application properties](configuration.html#application-properties) and
[Runtime Environment](configuration.html#runtime-environment) cannot be set programmatically, as the
driver JVM process would have been started, in this case SparkR takes care of this for you. To set
them, pass them as you would other configuration properties in the `sparkEnvir` argument to
`sparkR.init()`.

<div data-lang="r" markdown="1">
{% highlight r %}
sc <- sparkR.init("local[*]", "SparkR", "/home/spark", list(spark.driver.memory="2g"))
if (nchar(Sys.getenv("SPARK_HOME")) < 1) {
Sys.setenv(SPARK_HOME = "/home/spark")
}
library(SparkR, lib.loc = c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib")))
sc <- sparkR.init(master = "local[*]", sparkEnvir = list(spark.driver.memory="2g"))
{% endhighlight %}
</div>

The following options can be set in `sparkEnvir` with `sparkR.init` from RStudio:

<table class="table">
<tr><th>Property Name</th><th>Property group</th><th><code>spark-submit</code> equivalent</th></tr>
<tr>
<td><code>spark.driver.memory</code></td>
<td>Application Properties</td>
<td><code>--driver-memory</code></td>
</tr>
<tr>
<td><code>spark.driver.extraClassPath</code></td>
<td>Runtime Environment</td>
<td><code>--driver-class-path</code></td>
</tr>
<tr>
<td><code>spark.driver.extraJavaOptions</code></td>
<td>Runtime Environment</td>
<td><code>--driver-java-options</code></td>
</tr>
<tr>
<td><code>spark.driver.extraLibraryPath</code></td>
<td>Runtime Environment</td>
<td><code>--driver-library-path</code></td>
</tr>
</table>

</div>

Expand Down

0 comments on commit a9676cc

Please sign in to comment.