Starred repositories
3
stars
written in Cuda
Clear filter
Efficient GPU kernels for block-sparse matrix multiplication and convolution
Faster depthwise convolutions for PyTorch
Windows version of NVIDIA's NCCL ('Nickel') for multi-GPU training - please use https://github.com/NVIDIA/nccl for changes.