Skip to content

Commit

Permalink
Restructure code gen for FBGEMMFP16 (pytorch#236)
Browse files Browse the repository at this point in the history
Summary:
Pull Request resolved: pytorch#236

1) Merge 1st iteration with data load and compute, reduce pressure from load port. Execute load and fma in parallel
2) Mask latency of Fp16->Fp32 conversion. Start conversion of B values required by the next iteration. !!! If matrix B size is exactly a page will generate segfault
3) Add per-matrix prefetch distance control

Codegen built with:
g++ --std=c++11 fbgemm/src/codegen_fp16fp32.cc -o code_gen
./code_gen

Reviewed By: dskhudia

Differential Revision: D19222915

fbshipit-source-id: 3cef33e99f636c6355245a4618b03787a06e41fa
  • Loading branch information
efiks authored and jspark1105 committed Mar 21, 2020
1 parent cc2bc13 commit bbb676a
Show file tree
Hide file tree
Showing 4 changed files with 3,074 additions and 765 deletions.
Loading

0 comments on commit bbb676a

Please sign in to comment.