Tags · NVIDIA/cudnn-frontend

New API
- Graph Slice Operation: Introduced the graph.slice operation for slicing input tensors. Refer to docs/operations/Slice.md for detailed documentation and samples/cpp/misc/slice.cpp for a C++ sample. Pybinds for this operation have also been added.
- SM Carveout Feature: Added the set_sm_count(int32_t type) graph property to support the SM Carveout feature introduced in Ampere and Hopper GPUs. Engines that do not support SM_COUNT will return NOT_SUPPORTED.
Bug Fixes
- Convolution Mode Attribute: Added the missing set_convolution_mode attribute to convolution attributes in forward propagation (fprop), data gradient (dgrad), and weight gradient (wgrad). Previously, this was hardcoded to CUDNN_CROSS_CORRELATION in the 1.x API.
- SDPA FP8 Backward Node: Fixed an issue with the deserialization of the sdpa_fp8_backward node.
Enhancements
- Graph Execution Overhead: Reduced the overhead of graph.execute() by optimizing sub-node tree traversal, collected UIDs, workspace modifications, and workspace size.
- Graph Validation Performance: Significantly improved (~10x) the performance of graph.validate() by deferring graph expansion to a later stage (build_operation_graph).
- Optional Running Stats for BatchNorm: Made the running statistics for the batch normalization operation optional, supported by cuDNN backend version 9.3.0 and later.
- Shape and Stride Inferencing: Enhanced shape and stride inferencing to preserve the stride order of the input.
- Diagnostic Error Message: Added a diagnostic error message to create_execution_plans if called without the preceding build_operation_graph.
- JSON Schema and Deserialization: Improved the JSON schema and deserialization logic with additional checks.
- Logging Overhead: Reduced logging overhead, resulting in faster graph.build() calls.
- CMake Integration: Replaced CMAKE_SOURCE_DIR with PROJECT_SOURCE_DIR in CMake files for better integration. See the relevant pull request for more details.
Samples
- Jupyter Notebooks: Added Jupyter notebooks for RMSNorm, InstanceNorm, and LayerNorm. Refer to the samples/python folder for more information.

Aug 12, 2024
23511ba
zip
tar.gz
Notes

v1.5.2

Release notes for cudnn-frontend 1.5.2: (#86)

[Enhancement] Allows stride value of 0 indicating repetition of tensor in those dimensions.

Jun 25, 2024
98ca4e1
zip
tar.gz
Notes

v1.5.1

Release notes for cudnn-frontend 1.5.1: (#84)

[Bug fix] Fixed an issue, where cudnn-frontend (1.5.0) when built with
cudnn version 9.1.1 and below, runs into issues when run with 9.2.0 and
above.

Jun 18, 2024
aa3abd4
zip
tar.gz
Notes

v1.5.0

Release notes for cudnn-frontend 1.5.0: (#81)

[New feature] With cudnn backend 9.2.0 and above, `Graph::check_support`
can determine support check for runtime engines without invoking the
nvrtc compiler. This allows users to check the support surface of cudnn
without invoking the nvrtc compilation.

[New feature] Python pip wheel now contains the necessary c++
development headers.

[New feature] Sliding window attention is now supported as an attribute
to the sdpa forward and bprop node. Usage:
`sdpa_attributes.set_sliding_window_length(window_length)`

[New feature] Bottom right aligned causal masking is now supported as an
attribute to the sdpa forward and bprop node. Usage:
`sdpa_attributes.use_causal_mask_bottom_right(true)`

[New feature] SDPA bprop attributes can choose deterministic algorithm
using the `use_deterministic_algorithm` API.

[New feature] Allow users to filter candidate execution plans of graph
by its shared memory usage in cudnn 9.2.0 and later.

[Bug fix] A runtime error if chosen execution plan candidate is
incorrectly set in the backend has been fixed. This would happen when
`check_support` does not correctly filter by the workspace size.

[Bug fix] selecting/deselecting by behavior and numerical notes has now
been fixed and works as intended.

[Debugging] A new tool for easy reproduction of a failure using the json
representation of the graph can be found [here](tools/json_reproducer).

[Samples] Restructured the cpp samples into categories for easier
navigation.

[Samples] Added a sample to showcase how different plans can be built in
parallel in separate threads.

[Compilation enhancement] Added a new macro
`CUDNN_FRONTEND_SKIP_NLOHMANN_JSON` as compilation flag to not have
nlohman::json as compilation dependency. Users lose access to certain
API functions like `print`, `key`, `serialize`, `deserialzie` that
depend on the library.

[Enhancement] Serialization of resample operation is now supported.

[Enhancement] Bug template has been added for new github issues

Jun 13, 2024
47d800c
zip
tar.gz
Notes

v1.4.0

[New] Added a benchmark folder which contains a sample docker file to (…

…#73)

compare cudnn implementation of sdpa with that of the pytorch
implementation.

[Enhancement] Once an engine is de-selected by name, it will not be
built as part of check support.

[Enhancement] The cudnn backend search order for wheels is as follows:
(a) It will dlopen `libcudnn.so.MAJOR_VERSION` in the site packages. (b)
It will try to dlopen unversioned libcudnn.so. This way pypi cudnn
package nvidia-cudnn-cu* gets priority over default search path.

[Enhancement] Allow embedding dimension up to 256 (currently limited to
128) in sdpa fprop operation.

[Bug fix] Update the scale and bias shapes in batch norm sample.

May 7, 2024
b740542
zip
tar.gz
Notes

v1.3.0

cudnn frontend v1.3 release notes. (#72)

[New API] Added new operations `sdpa_fp8_forward` and `sdpa_fp8_backward` to perform scaled dot prodcut attention of fp8 tensors. See more details in the `docs/operations/Attention.md` and cpp sample in `samples/cpp/mha.cpp`. Pybinds for the fp8 nodes are also added.

[New API] Added new operation for resample forward operation. Add a new sample `samples/cpp/resample.cpp` to show its usage.

[New API] Add a new API `deselect_engines(std::vector<std::string> const &engine_names)` which blocks certain engine configs from running.

[New API] Add new APIs `select_numeric_notes` and `select_behavior_notes` to allow user select engine configs which have the selected numeric and behavior notes respectively.

[Python API] Added a custom exception `cudnnGraphNotSupportedException` to the python API to distinguish between graphs that are actually not supported as compared to programming errors.

[Python API] Added a new `backend_version_string` which returns the backend version in canonical form (eg. 9.1.0) instead of a version number.

[Bug Fix] Updated the workspace computation for sdpa fprop node. Previously, workspace was calculated for alibi slopes irrespective of whether alibi mask was turned on or not.

[Bug Fix] Fixed deserialization of pass by values of half precision.

Apr 10, 2024
1b0b5ea
zip
tar.gz
Notes
Downloads

v1.2.1

cudnn frontend v1.2.1 release notes. (#69)

[Bug Fix] cudnn-frontend pip wheels will now dlopen the fully version
tag first `libucdnn.so.8` or `libcudnn.so.9` first before trying to load
`libcudnn.so`. This means the pip wheels in the RUN_PATH will be
prioritized over system paths (default behavior of dlopen). This can be
overridden by setting the `LD_LIBRARY_PATH`. Source installation will
now automatically look at cudnn in site packages before system path.

[Documentation] Fixed the google-colab links in the jupyter notebooks.

[Documentation] Added a jupyter notebook sample to go over the basics of
cudnn FE graph API.  `00_introduction.ipynb`

Mar 20, 2024
e5fb0ed
zip
tar.gz
Notes
Downloads

PreviousNext

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1.8.0

v1.7.0

v1.6.1

v1.6.0