You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Do you expect to have different performance than pyspark.sql? Do you think it will be faster or slower?
I think it make sense to keep only one of them rather than maintaining both. Running each solution costs couple hours of high spec machine, so I would avoid benchmarking spark interfaces (SQL/pandas) and focus on the engine. I am sure they share the same spark engine.
Are there any plans to add Pyspark.pandas to the benchmark?
The text was updated successfully, but these errors were encountered: