Skip to content

Commit

Permalink
Auto-generation of CUTLASS Extension Kernel Templates (pytorch#2932)
Browse files Browse the repository at this point in the history
Summary:
X-link: facebookresearch/FBGEMM#33

Pull Request resolved: pytorch#2932

This diff allows cutlass_extension to use configuration-based auto-instance generation. The diff aims to achieve the following :

(a) Many kernels needs to be instanced varying the template arguments and it is hard to instance them all by hand.
(b) Use and extend OSS NVIDIA scripts for FBGEMM (Meta AI) use cases.
(c) Confirm with CUTLASS's device-side API to allow use to perturb all the template parameters that CUTLASS allows.
(d) The bullet (b) and (c) allows us to bring our internal usage close to the NVIDIA/CUTLASS and we can upstream our kernels quickly to NVIDIA/CUTLASS repo.

Reviewed By: ipiszy

Differential Revision: D60171966

fbshipit-source-id: 8dfd80223a7c40c79446a50b93c87bf339e7596a
  • Loading branch information
manishucsd authored and facebook-github-bot committed Aug 26, 2024
1 parent d693267 commit de845bf
Show file tree
Hide file tree
Showing 22 changed files with 1 addition and 2,656 deletions.
2 changes: 1 addition & 1 deletion fbgemm_gpu/experimental/gen_ai/bench/quantize_ops.py
Original file line number Diff line number Diff line change
Expand Up @@ -514,7 +514,7 @@ def quantize(self, x, w):
return xq, wq, x_scale, w_scale

def compute(self, xq, wq, x_scale, w_scale):
return torch.ops.fbgemm.f8f8bf16_v2(xq, wq, x_scale * w_scale)
return torch.ops.cutlass_extensions.f8f8bf16(xq, wq, x_scale * w_scale)

def quantize_and_compute(self, x, w):
xq, wq, x_scale, w_scale = self.quantize(x, w)
Expand Down

This file was deleted.

This file was deleted.

Loading

0 comments on commit de845bf

Please sign in to comment.