Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Restructure code gen for FBGEMMFP16 (pytorch#236)
Summary: Pull Request resolved: pytorch#236 1) Merge 1st iteration with data load and compute, reduce pressure from load port. Execute load and fma in parallel 2) Mask latency of Fp16->Fp32 conversion. Start conversion of B values required by the next iteration. !!! If matrix B size is exactly a page will generate segfault 3) Add per-matrix prefetch distance control Codegen built with: g++ --std=c++11 fbgemm/src/codegen_fp16fp32.cc -o code_gen ./code_gen Reviewed By: dskhudia Differential Revision: D19222915 fbshipit-source-id: 3cef33e99f636c6355245a4618b03787a06e41fa
- Loading branch information