forked from pytorch/FBGEMM
-
Notifications
You must be signed in to change notification settings - Fork 3
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Make evicted_rows a UVA buffer (pytorch#3079)
Summary: Pull Request resolved: pytorch#3079 X-link: facebookresearch/FBGEMM#173 Prior to this diff, SSD-TBE used a combination of a pinned CPU buffer and the GPU buffer for `evicted_rows` (the buffer for staging rows that are evicted from L1 cache). It explicitly performed asynchronous memory copy (via `cudaMemcpyAsync`) to transfer `evicted_rows` from device to host. Since the number of evicted rows is known only on the device, SSD-TBE overallocated the `evicted_rows` CPU and GPU buffers. Therefore, it transferred extra data during the device-host memory copy. Such the extra data could be large and could make the memory copy a bottleneck of an execution. This diff mitigates the problem mentioned above by using a unified address buffer for `evicted_rows` and using a kernel (namely `masked_index_select` to load/store data instead of using a CUDA memory copy operation. This mechanism can avoid the extra memory copy. However, the memory copy can be less efficient (might not be able to fully saturate the available memory bandwidth) since it does not use the copy engine. Moreover, since it uses SMs for memory copy, when overlapping the operator with other computes, it can potentially compete for the SM resources with others. Reviewed By: q10 Differential Revision: D62114877 fbshipit-source-id: d91ad0be2820ac21033270f14a784a0bc3193d78
- Loading branch information
1 parent
e48b9f7
commit 53d84ad
Showing
1 changed file
with
58 additions
and
15 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters