There are currently two ways to use TiSpark on SparkR:
This is the simplest way, just a decent Spark environment should be enough.
-
Make sure you have the latest version of TiSpark and a
jar
with all TiSpark's dependencies. -
Remember to add needed configurations listed in README into your
$SPARK_HOME/conf/spark-defaults.conf
-
Run this command in your
$SPARK_HOME
directory:
./bin/sparkR --jars /where-ever-it-is/tispark-${name_with_version}.jar
- To use TiSpark, run these commands:
sql("use tpch_test")
count <- sql("select count(*) from customer")
head(count)
This way is useful when you want to execute your own R scripts.
- Create a R file named
test.R
as below:
library(SparkR)
sparkR.session()
sql("use tpch_test")
count <- sql("select count(*) from customer")
head(count)
- Prepare your TiSpark environment as above and execute
./bin/spark-submit --jars /where-ever-it-is/tispark-${name_with_version}.jar test.R
- Result:
+--------+
|count(1)|
+--------+
| 150|
+--------+