Skip to content

Commit

Permalink
remove unnecessary prefetch for scale/bias in corner cases (pytorch#323)
Browse files Browse the repository at this point in the history
Summary:
Pull Request resolved: pytorch#323

In D20582936 we missed a case when fused_block_size is exactly a multiple of 64 we don't need extra prefetch for scale and bias

Reviewed By: shz0116

Differential Revision: D20585862

fbshipit-source-id: ad19add49188dc14720429a7240bab7eb732b8dc
  • Loading branch information
jspark1105 authored and facebook-github-bot committed Mar 22, 2020
1 parent d7c4a34 commit 4b36034
Show file tree
Hide file tree
Showing 2 changed files with 2 additions and 2 deletions.
2 changes: 1 addition & 1 deletion src/EmbeddingSpMDM.cc
Original file line number Diff line number Diff line change
Expand Up @@ -541,7 +541,7 @@ typename ReturnFunctionSignature<inType, indxType, ROWWISE_SPARSE>::
a->vbroadcastss(scale_vreg, scale_src);
a->vbroadcastss(bias_vreg, bias_src);

if (pref_dist &&
if (pref_dist && fused_block_size % CACHE_LINE_LEN > 0 &&
fused_block_size % CACHE_LINE_LEN <= 2 * sizeof(float)) {
a->prefetcht0(x86::dword_ptr(
input,
Expand Down
2 changes: 1 addition & 1 deletion src/EmbeddingSpMDMNBit.cc
Original file line number Diff line number Diff line change
Expand Up @@ -585,7 +585,7 @@ GenEmbeddingSpMDMNBitLookup<indxType, ROWWISE_SPARSE>::getOrCreate(
a->vcvtph2ps(
vec_reg_t(bias_vreg.id()), half_vec_reg_t(bias_vreg.id()));
constexpr int CACHE_LINE_LEN = 64;
if (pref_dist &&
if (pref_dist && fused_block_size % CACHE_LINE_LEN > 0 &&
fused_block_size % CACHE_LINE_LEN <= 2 * sizeof(float16)) {
a->prefetcht0(x86::dword_ptr(
input,
Expand Down

0 comments on commit 4b36034

Please sign in to comment.