TODO instructions to be provided
TPCH timing results is written to stdout in the following form: TPCH_Result,<language>,<test type>,<query number>,<iteration>,<total time taken for iteration in milliseconds>,<time taken to run query in milliseconds>
- Cold Run
- Each <query + iteration> uses a new spark-submit
- Warm Run
- Each query uses a new spark-submit
- Each iteration reuses the Spark Session after creating the Dataframe (therefore, skips the load phase that does file enumeration)
- Ensure that the Microsoft.Spark.Worker is properly installed in your cluster.
- Build
microsoft-spark-<spark_majorversion.spark_minorversion.x>-<spark_dotnet_version>.jar
and the CSharp Tpch benchmark application by following the build instructions. - Upload run_csharp_benchmark.sh, the Tpch benchmark application, and
microsoft-spark-<spark_majorversion.spark_minorversion.x>-<spark_dotnet_version>.jar
to the cluster. - Run the benchmark by invoking:
run_csharp_benchmark.sh \ <number of cold iterations> \ <num_executors> \ <driver_memory> \ <executor_memory> \ <executor_cores> \ </path/to/Tpch.dll> \ </path/to/microsoft-spark-<spark_majorversion.spark_minorversion.x>-<spark_dotnet_version>.jar> \ </path/to/Tpch executable> \ </path/to/dataset> \ <number of iterations> \ <true for sql tests, false for functional tests>
- Upload run_python_benchmark.sh and all python tpch benchmark files to the cluster.
- Run the benchmark by invoking:
run_python_benchmark.sh \ <number of cold iterations> \ <num_executors> \ <driver_memory> \ <executor_memory> \ <executor_cores> \ </path/to/tpch.py> \ </path/to/dataset> \ <number of iterations> \ <true for sql tests, false for functional tests>
mvn package
to build the scala tpch benchmark application.- Upload run_scala_benchmark.sh and the
microsoft-spark-benchmark-<version>.jar
to the cluster. - Run the benchmark by invoking:
run_scala_benchmark.sh \ <number of cold iterations> \ <num_executors> \ <driver_memory> \ <executor_memory> \ <executor_cores> \ </path/to/microsoft-spark-benchmark-<version>.jar> \ </path/to/dataset> \ <number of iterations> \ <true for sql tests, false for functional tests>