Skip to content

Commit

Permalink
Add a new config for row-wise quant fp8 gemm perf bench with fp8_fast…
Browse files Browse the repository at this point in the history
…_accum=false (pytorch#2686)

Summary:
Pull Request resolved: pytorch#2686

Adding a new config learned from cuBLAS.

Reviewed By: jianyuh

Differential Revision: D57746696

fbshipit-source-id: 1d34766a4aaa874d42338be2867d67be45a5152e
  • Loading branch information
htyu authored and facebook-github-bot committed Jun 6, 2024
1 parent 85ed64c commit 900b05b
Showing 1 changed file with 8 additions and 1 deletion.
9 changes: 8 additions & 1 deletion fbgemm_gpu/experimental/gemm/triton_gemm/fp8_gemm.py
Original file line number Diff line number Diff line change
Expand Up @@ -332,7 +332,14 @@ def _kernel_matmul_fp8_row(


@triton.autotune(
configs=MATMUL_CONFIGS,
configs=MATMUL_CONFIGS
+ [
Config(
{"BLOCK_M": 128, "BLOCK_N": 128, "BLOCK_K": 128, "SPLIT_K": 1},
num_stages=3,
num_warps=8,
),
],
key=[
"m_key",
"n_key",
Expand Down

0 comments on commit 900b05b

Please sign in to comment.