Tags · EatenBagpipe/cutlass

v3.0.0

Updates for 3.0 (NVIDIA#857)

Co-authored-by: Aniket Shivam <[email protected]>

Mar 9, 2023
c4f6b8c
zip
tar.gz

v2.11.0

New updates for 2.11 (NVIDIA#775)

* New updates.

* Minor profiler updates

Co-authored-by: Aniket Shivam <[email protected]>

Jan 20, 2023
66d9cdd
zip
tar.gz

v2.10.0

CUTLASS 2.10 bug fixes and minor updates. (NVIDIA#626)

Sep 15, 2022
fc9ebc6
zip
tar.gz

v2.9.1

Update linear_combination_generic.h (NVIDIA#472)

add `skip_elementwise_` to support serial splitk in linear_combination_generic.h`

Jun 28, 2022
e45e773
zip
tar.gz

v2.9.0

Update CMakeLists.txt (NVIDIA#473)

* Update CMakeLists.txt

Add 128bit int support if using nvc++ to solve NVIDIA#310 

@jeffhammond, would you please give it a try?

* Update CMakeLists.txt

correct copy paste error

Apr 27, 2022
319a389
zip
tar.gz

v2.8.0

Updated GEMM performance plot with CUTLASS 2.8 compiled with CUDA 11.…

…5 Toolkit (NVIDIA#375)

Updated GEMM performance plot with CUTLASS 2.8 compiled using CUDA 11.5 Toolkit.

GPUs under test:

    NVIDIA A100
    NVIDIA A2
    NVIDIA TitanV
    NVIDIA GeForce 2080 Ti

Dec 6, 2021
5fe09c2
zip
tar.gz

v2.7.0

CUTLASS 2.7 (NVIDIA#318)

CUTLASS 2.7

Mainloop fusion for GEMM: summation over A or B
Strided DGRAD (optimized iterators)
Half-precision GELU_taylor activation functions
Use these when accumulation and epilogue compute types are all cutlass::half_t
Tuning and bug fixes to fused GEMM + GEMM example
Support for smaller than 128b aligned Convolutions: see examples
Caching of results to accelerate Convolution unit tests
Can be enabled or disabled by running cmake .. -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=OFF
Corrections and bug fixes reported by the CUTLASS community
Thank you for filing these issues!

authored-by: Haicheng Wu [email protected], Manish Gupta [email protected], Dustyn Blasig [email protected], Andrew Kerr [email protected]

Sep 20, 2021
2e07c4c
zip
tar.gz

v2.6.1

CUTLASS 2.6.1 - functional and performance enhancements to strided DG…

…RAD, fixes, and tuning

* cutlass 2.6 update

* remove debug prints

* cutlass 2.6.1 (minor update)

* Updated CHANGELOG.

* Minor edit to readme to indicate patch version.

* Minor edit to readme.

Co-authored-by:  Haicheng Wu <[email protected]>, Andrew Kerr <[email protected]>

Sep 3, 2021
6c2f8f2
zip
tar.gz

v2.6.0

Merge pull request NVIDIA#308 from dongxiao92/patch-1

fix typo in doc

Aug 8, 2021
a01feb9
zip
tar.gz

v2.5.0

Create PUBLICATIONS.md (NVIDIA#189)

Mar 3, 2021
0f10563
zip
tar.gz

PreviousNext

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v3.0.0

v2.11.0

v2.10.0

v2.9.1

v2.9.0

v2.8.0

v2.7.0

v2.6.1

v2.6.0

v2.5.0

Tags: EatenBagpipe/cutlass