forked from pytorch/FBGEMM
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Optimize the inner most loop for jagged tensor implementation (pytorc…
…h#1041) Summary: Pull Request resolved: pytorch#1041 Optimize the inner most loop: For FP16, we prefer using 128 Byte access per warp (32 threads): cache line size is 128 Bytes on A100 GPUs. Reviewed By: jasonjk-park Differential Revision: D35532377 fbshipit-source-id: bcb7e82cd817b90203d78244f78597c9f8f41b7e
- Loading branch information
1 parent
9147ea2
commit eacd342
Showing
2 changed files
with
43 additions
and
12 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters