forked from ray-project/ray
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[Datasets] Bundle blocks smaller than batch size in
map_batches
tas…
…ks. (ray-project#28648) When blocks are smaller than the batch size in map_batches(), the actual size of batches provided to the UDF can be much smaller (order of magnitude) than the specified batch size, resulting in very poor batch mapping throughput when the UDF's performance is sensitive to the batch size (such as in batch inference on GPUs). This PR optimistically sends multiple blocks to a single mapper task up to (but not past) the provided batch size, mitigating the case in which block size << batch size without needing a shuffle step.
- Loading branch information
1 parent
e142be0
commit b5687a1
Showing
12 changed files
with
434 additions
and
126 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.