forked from pytorch/FBGEMM
-
Notifications
You must be signed in to change notification settings - Fork 3
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add manual loop unroll for rocm devices in fwd pass (pytorch#3345)
Summary: Pull Request resolved: pytorch#3345 X-link: facebookresearch/FBGEMM#438 Added another instance of FWD pass for ROCm devices. Manually split load and accumulate/store macros and manually unrolled outer loops. Added run-time guard check for ROCm devices. Current limitation is L % 4 == 0. Some of forward unit tests fail (probably due to guard check), need to investigate that. Until UT fixed, keeping this PR as draft. Performance measurements will be shared via another channel. cc: liligwu amathews-amd Pull Request resolved: pytorch#3309 Reviewed By: jianyuh Differential Revision: D65620886 Pulled By: leitian fbshipit-source-id: 59ae2e4869c860d575bfb05710525dd1d1dc3761
- Loading branch information
1 parent
9bddf70
commit 247dd46
Showing
1 changed file
with
305 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters