-
NVIDIA
- Tokyo
-
18:42
(UTC +09:00)
Stars
- All languages
- AppleScript
- Batchfile
- C
- C#
- C++
- CMake
- CSS
- Common Lisp
- Crystal
- Cuda
- Cython
- Dart
- Dockerfile
- Elixir
- Fortran
- Go
- HCL
- HTML
- Haskell
- Java
- JavaScript
- Julia
- Jupyter Notebook
- Kotlin
- LLVM
- Lua
- MATLAB
- MLIR
- Makefile
- Markdown
- OCaml
- Objective-C
- PHP
- Perl
- Prolog
- Python
- R
- Roff
- Ruby
- Rust
- SCSS
- Scala
- Scheme
- Shell
- Starlark
- Swift
- TeX
- Thrift
- TypeScript
- VHDL
- Vim Script
- Vue
- Wren
Code and data for paper "Deep Painterly Harmonization": https://arxiv.org/abs/1804.03189
Fully Convolutional Instance-aware Semantic Segmentation
Efficient GPU kernels for block-sparse matrix multiplication and convolution
RAFT contains fundamental widely-used algorithms and primitives for machine learning and information retrieval. The algorithms are CUDA-accelerated and form building blocks for more easily writing …
Reference implementation of real-time autoregressive wavenet inference
Training materials associated with NVIDIA's CUDA Training Series (www.olcf.ornl.gov/cuda-training-series/)
An efficient GPU support for LLM inference with x-bit quantization (e.g. FP6,FP5).
WholeGraph - large scale Graph Neural Networks
Optimized Parallel Tiled Approach to perform Matrix Multiplication by taking advantage of the lower latency, higher bandwidth shared memory within GPU thread blocks.