Add Pyspark.pandas to benchmark #246

TjommeVergauwen · 2022-04-02T11:05:45Z

Are there any plans to add Pyspark.pandas to the benchmark?

jangorecki · 2022-04-02T12:15:54Z

Do you expect to have different performance than pyspark.sql? Do you think it will be faster or slower?
I think it make sense to keep only one of them rather than maintaining both. Running each solution costs couple hours of high spec machine, so I would avoid benchmarking spark interfaces (SQL/pandas) and focus on the engine. I am sure they share the same spark engine.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Pyspark.pandas to benchmark #246

Add Pyspark.pandas to benchmark #246

TjommeVergauwen commented Apr 2, 2022

jangorecki commented Apr 2, 2022 •

edited

Loading

Add Pyspark.pandas to benchmark #246

Add Pyspark.pandas to benchmark #246

Comments

TjommeVergauwen commented Apr 2, 2022

jangorecki commented Apr 2, 2022 • edited Loading

jangorecki commented Apr 2, 2022 •

edited

Loading