Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[SPARK-4384] [PySpark] improve sort spilling
If there some big broadcasts (or other object) in Python worker, the free memory could be used for sorting will be too small, then it will keep spilling small files into disks, finally failed with too many open files. This PR try to delay the spilling until the used memory goes over limit and start to increase since last spilling, it will increase the size of spilling files, improve the stability and performance in this cases. (We also do this in ExternalAggregator). Author: Davies Liu <[email protected]> Closes apache#3252 from davies/sort and squashes the following commits: 711fb6c [Davies Liu] improve sort spilling
- Loading branch information