Tags: zcyKTH/cudnn-frontend
Tags
V0.5.1 patch (NVIDIA#24) * Update the timing code in the cudnn find plan to include the stream-ID on which it was launched. * Fix a typo in CMakelist.txt. * Fix compilation warnings in multiple files with latest GCC. Co-authored-by: Anerudhan Gopal <[email protected]>
Patch 0.4.1 (NVIDIA#7) * Release 0.4.1 [Bug Fix] : Fixed an issue where the vector count was not copied over during move construction phase. [Samples]: New sample added for IMMA. Added an errrata filter which blocks non-TensorCore engine from running it. [CleanUp]: Change all move constructors and fixed move assignment operator. * Rename getDimension in Convolution to spatial dimension for clarity Co-authored-by: agopal <[email protected]>
[New API] : Added a new function get_heuristics_list which accepts a … …list of heuristics mode and returns a concatenated list of the engine heuristics. [New Feature]: New mode of heuristic (HEUR_MODE_FALLBACK] added to the backend. Sample updated to use that and provides a generic way to access the fallback engines. FallbackEngineList is retained as a way to add custom engines in the frontend. [New Feature]: Added support to set vectorization dimension and vectorization count attributes in the tensor descriptor. [Rename]: setDataType in OperationBuilder deprecated and replaced with more clear setComputePrecision() [CleanUp] : cudnnFindPlan and cudnnGetPlan takes L-value operationGraph rather than previously R-value. [CleanUp] : cudnnFindPlan and time_sorted_plan return executionPlans_t (which is a vector plans) instead of executionOptions_t (which is a vector of struct containing plan and time). This is to achieve compatibility with the cudnnGet. [Samples]: New sample added for DP4A. [Samples]: ConvBiasScaleRelu sample| [Bug fix]: Errata filter was erroneously filtering out unspecified engines.
Merge pull request NVIDIA#2 from NVIDIA/staging Changes in pull request: Fix compilation warnings reported with -Wall and -Wextra flags Support for backward activations dx = f(dy, X). Support for lower_clip, upper_clip, lower_clip_slope and alpha and beta paramters for relu, elu, softplus and swish. Added additional checks during build phase. Such as for bDesc being nullptr etc. Improved error checking for xDesc, yDesc depending on whether the operation is convolution or pointwise. Add matmul descriptor Add conv_scale_bias_add_relu and matmul_bias_gelu sample Comparison between frontend and backend Fix compilation issue in samples for gcc-5 New sample for HEUR_B