Restructure code gen for FBGEMMFP16 (#236) · LinGongHeng/FBGEMM@bbb676a

Commit

Restructure code gen for FBGEMMFP16 (pytorch#236)

Summary:
Pull Request resolved: pytorch#236

1) Merge 1st iteration with data load and compute, reduce pressure from load port. Execute load and fma in parallel
2) Mask latency of Fp16->Fp32 conversion. Start conversion of B values required by the next iteration. !!! If matrix B size is exactly a page will generate segfault
3) Add per-matrix prefetch distance control

Codegen built with:
g++ --std=c++11 fbgemm/src/codegen_fp16fp32.cc -o code_gen
./code_gen

Reviewed By: dskhudia

Differential Revision: D19222915

fbshipit-source-id: 3cef33e99f636c6355245a4618b03787a06e41fa

Loading branch information

efiks authored and jspark1105 committed Mar 21, 2020

1 parent cc2bc13 commit bbb676a

0 comments on commit `bbb676a`

Please sign in to comment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commit

There are no files selected for viewing

0 comments on commit `bbb676a`

Commit

There are no files selected for viewing

0 comments on commit bbb676a

0 comments on commit `bbb676a`