Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Summary: Pull Request resolved: pytorch/pytorch#59541 Pull Request resolved: pytorch#621 Fixing 2 issues. These are actually 2 independent issues one in Caffe2 and another in FBGEMM, so no need to wait until FBGEMM is synchronized with PyTorch 1) conv 16-bit accumulation doesn't support fast gconv path, so TakeGConvFastPath_ should honor it 2) packed_index_ generates indices up to (G/GTogether_) F R S OC_per_G GTogether_ paddedICPerG which can exceed G kernel_prod OC_per_G paddedICPerG allocated in PackWeightMatrixForGConv (kernel_prod = F R S): e.g., when G=3, GTogether_=2, we allocate 3 F R S OC_per_G paddedICPerG but we access up to 2 F R S OC_per_G 2 paddedICPerG BTW, not sure how we haven't known about this issue for so long. Any idea will be really appreciated. Reviewed By: dskhudia Differential Revision: D28927214 fbshipit-source-id: 3ec98ea2fc177545392a0148daca592d80f40ad3
- Loading branch information