Skip to content

Latest commit

 

History

History

R

Folders and files

NameName
Last commit message
Last commit date
 
 

TiSparkR

Usage

There are currently two ways to use TiSpark on SparkR:

Directly via sparkR

This is the simplest way, just a decent Spark environment should be enough.

  1. Make sure you have the latest version of TiSpark and a jar with all TiSpark's dependencies.

  2. Remember to add needed configurations listed in README into your $SPARK_HOME/conf/spark-defaults.conf

  3. Run this command in your $SPARK_HOME directory:

./bin/sparkR --jars /where-ever-it-is/tispark-${name_with_version}.jar
  1. To use TiSpark, run these commands:
sql("use tpch_test")
count <- sql("select count(*) from customer")
head(count)

Via spark-submit

This way is useful when you want to execute your own R scripts.

  1. Create a R file named test.R as below:
library(SparkR)
sparkR.session()
sql("use tpch_test")
count <- sql("select count(*) from customer")
head(count)
  1. Prepare your TiSpark environment as above and execute
./bin/spark-submit --jars /where-ever-it-is/tispark-${name_with_version}.jar test.R
  1. Result:
+--------+
|count(1)|
+--------+
|     150|
+--------+