Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Auto-generation of CUTLASS Extension Kernel Templates (pytorch#2932)
Summary: X-link: facebookresearch/FBGEMM#33 Pull Request resolved: pytorch#2932 This diff allows cutlass_extension to use configuration-based auto-instance generation. The diff aims to achieve the following : (a) Many kernels needs to be instanced varying the template arguments and it is hard to instance them all by hand. (b) Use and extend OSS NVIDIA scripts for FBGEMM (Meta AI) use cases. (c) Confirm with CUTLASS's device-side API to allow use to perturb all the template parameters that CUTLASS allows. (d) The bullet (b) and (c) allows us to bring our internal usage close to the NVIDIA/CUTLASS and we can upstream our kernels quickly to NVIDIA/CUTLASS repo. Reviewed By: ipiszy Differential Revision: D60171966 fbshipit-source-id: 8dfd80223a7c40c79446a50b93c87bf339e7596a
- Loading branch information