Name		Name	Last commit message	Last commit date
parent directory ..
csharp		csharp
python		python
scala		scala
README.md		README.md
run_csharp_benchmark.sh		run_csharp_benchmark.sh
run_python_benchmark.sh		run_python_benchmark.sh
run_scala_benchmark.sh		run_scala_benchmark.sh

README.md

Benchmarking

Generate Data

TODO instructions to be provided

Cluster Run

TPCH timing results is written to stdout in the following form: TPCH_Result,<language>,<test type>,<query number>,<iteration>,<total time taken for iteration in milliseconds>,<time taken to run query in milliseconds>

Cold Run
- Each <query + iteration> uses a new spark-submit
Warm Run
- Each query uses a new spark-submit
- Each iteration reuses the Spark Session after creating the Dataframe (therefore, skips the load phase that does file enumeration)

CSharp

Ensure that the Microsoft.Spark.Worker is properly installed in your cluster.
Build microsoft-spark-<spark_majorversion.spark_minorversion.x>-<spark_dotnet_version>.jar and the CSharp Tpch benchmark application by following the build instructions.
Upload run_csharp_benchmark.sh, the Tpch benchmark application, and microsoft-spark-<spark_majorversion.spark_minorversion.x>-<spark_dotnet_version>.jar to the cluster.

Run the benchmark by invoking:

run_csharp_benchmark.sh \
<number of cold iterations> \
<num_executors> \
<driver_memory> \
<executor_memory> \
<executor_cores> \
</path/to/Tpch.dll> \
</path/to/microsoft-spark-<spark_majorversion.spark_minorversion.x>-<spark_dotnet_version>.jar> \
</path/to/Tpch executable> \
</path/to/dataset> \
<number of iterations> \
<true for sql tests, false for functional tests>

Python

Upload run_python_benchmark.sh and all python tpch benchmark files to the cluster.

Run the benchmark by invoking:

run_python_benchmark.sh \
<number of cold iterations> \
<num_executors> \
<driver_memory> \
<executor_memory> \
<executor_cores> \
</path/to/tpch.py> \
</path/to/dataset> \
<number of iterations> \
<true for sql tests, false for functional tests>

Scala

mvn package to build the scala tpch benchmark application.
Upload run_scala_benchmark.sh and the microsoft-spark-benchmark-<version>.jar to the cluster.

Run the benchmark by invoking:

run_scala_benchmark.sh \
<number of cold iterations> \
<num_executors> \
<driver_memory> \
<executor_memory> \
<executor_cores> \
</path/to/microsoft-spark-benchmark-<version>.jar> \
</path/to/dataset> \
<number of iterations> \
<true for sql tests, false for functional tests>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

benchmark

benchmark

README.md

Benchmarking

Generate Data

Cluster Run

CSharp

Python

Scala

Files

benchmark

Directory actions

More options

Directory actions

More options

Latest commit

History

benchmark

Folders and files

parent directory

README.md

Benchmarking

Generate Data

Cluster Run

CSharp

Python

Scala