Skip to content

Tags: EatenBagpipe/cutlass

Tags

v3.0.0

Toggle v3.0.0's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
Updates for 3.0 (NVIDIA#857)

Co-authored-by: Aniket Shivam <[email protected]>

v2.11.0

Toggle v2.11.0's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
New updates for 2.11 (NVIDIA#775)

* New updates.

* Minor profiler updates

Co-authored-by: Aniket Shivam <[email protected]>

v2.10.0

Toggle v2.10.0's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
CUTLASS 2.10 bug fixes and minor updates. (NVIDIA#626)

v2.9.1

Toggle v2.9.1's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
Update linear_combination_generic.h (NVIDIA#472)

add `skip_elementwise_` to support serial splitk in linear_combination_generic.h`

v2.9.0

Toggle v2.9.0's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
Update CMakeLists.txt (NVIDIA#473)

* Update CMakeLists.txt

Add 128bit int support if using nvc++ to solve NVIDIA#310 

@jeffhammond, would you please give it a try?

* Update CMakeLists.txt

correct copy paste error

v2.8.0

Toggle v2.8.0's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
Updated GEMM performance plot with CUTLASS 2.8 compiled with CUDA 11.…

…5 Toolkit (NVIDIA#375)

Updated GEMM performance plot with CUTLASS 2.8 compiled using CUDA 11.5 Toolkit.

GPUs under test:

    NVIDIA A100
    NVIDIA A2
    NVIDIA TitanV
    NVIDIA GeForce 2080 Ti

v2.7.0

Toggle v2.7.0's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
CUTLASS 2.7 (NVIDIA#318)

CUTLASS 2.7

Mainloop fusion for GEMM: summation over A or B
Strided DGRAD (optimized iterators)
Half-precision GELU_taylor activation functions
Use these when accumulation and epilogue compute types are all cutlass::half_t
Tuning and bug fixes to fused GEMM + GEMM example
Support for smaller than 128b aligned Convolutions: see examples
Caching of results to accelerate Convolution unit tests
Can be enabled or disabled by running cmake .. -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=OFF
Corrections and bug fixes reported by the CUTLASS community
Thank you for filing these issues!

authored-by: Haicheng Wu [email protected], Manish Gupta [email protected], Dustyn Blasig [email protected], Andrew Kerr [email protected]

v2.6.1

Toggle v2.6.1's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
CUTLASS 2.6.1 - functional and performance enhancements to strided DG…

…RAD, fixes, and tuning

* cutlass 2.6 update

* remove debug prints

* cutlass 2.6.1 (minor update)

* Updated CHANGELOG.

* Minor edit to readme to indicate patch version.

* Minor edit to readme.

Co-authored-by:  Haicheng Wu <[email protected]>, Andrew Kerr <[email protected]>

v2.6.0

Toggle v2.6.0's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
Merge pull request NVIDIA#308 from dongxiao92/patch-1

fix typo in doc

v2.5.0

Toggle v2.5.0's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
Create PUBLICATIONS.md (NVIDIA#189)