-
NVIDIA
- Taiwan
-
-
flash-attention Public
Forked from PaddlePaddle/flash-attentionFast and memory-efficient exact attention
C++ BSD 3-Clause "New" or "Revised" License UpdatedOct 1, 2024 -
Paddle Public
Forked from PaddlePaddle/PaddlePArallel Distributed Deep LEarning (『飞桨』核心框架,高性能单机、分布式训练和跨平台部署)
-
TransformerEngine Public
Forked from NVIDIA/TransformerEngineA library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper GPUs, to provide better performance with lower memory utilization in bot…
Python Apache License 2.0 UpdatedSep 8, 2023 -
CUDALibrarySamples Public
Forked from NVIDIA/CUDALibrarySamplesCUDA Library Samples
Cuda Other UpdatedMay 8, 2023 -
openacc_fortran_examples Public
Simple OpenACC Fortran Examples
-
gpubootcamp Public
Forked from openhackathons-org/gpubootcampThis repository consists for gpu bootcamp material for HPC and AI
Jupyter Notebook Apache License 2.0 UpdatedJun 23, 2021 -
-
cuGemmProf Public
A simple tool to profile performance of multiple combinations of GEMM of cuBLAS
-
-
-
-
amazon-dsstne Public
Forked from amazon-archives/amazon-dsstneDeep Scalable Sparse Tensor Network Engine (DSSTNE) is an Amazon developed library for building Deep Learning (DL) machine learning (ML) models
C++ Apache License 2.0 UpdatedMar 2, 2020 -
TensorRT Public
Forked from NVIDIA/TensorRTTensorRT is a C++ library for high performance inference on NVIDIA GPUs and deep learning accelerators.
C++ Apache License 2.0 UpdatedDec 25, 2019 -
FluidDoc Public
Forked from PaddlePaddle/docsDocumentations for PaddlePaddle
Shell UpdatedDec 25, 2019 -
dlrm Public
Forked from facebookresearch/dlrmAn implementation of a deep learning recommendation model (DLRM)
Python MIT License UpdatedSep 11, 2019 -
Installation instructions for numba and pyculib by pip, tested on Ubuntu.
UpdatedAug 12, 2019 -
cutlass Public
Forked from NVIDIA/cutlassCUDA Templates for Linear Algebra Subroutines
-
NumPy-like API accelerated with CUDA
Python MIT License UpdatedJan 28, 2019 -
-
-
KerasToTensorRT Public
This is a simple demonstration for running Keras model model on Tensorflow with TensorRT integration(TFTRT) or on TensorRT directly without invoking "freeze_graph.py".
-
This is a simple demonstration for running Tensorflow inception v3 model on TensorRT