- San Francisco Bay Area, CA
Stars
The simplest, fastest repository for training/finetuning medium-sized GPTs.
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilizatio…
A retargetable MLIR-based machine learning compiler and runtime toolkit.
Starlark implementation of bazel rules for CUDA.
This repo holds the extended examples for rules_cuda.
Minkowski Engine is an auto-diff neural network library for high-dimensional sparse tensors
Source code for 'Data Parallel C++: Mastering DPC++ for Programming of Heterogeneous Systems using C++ and SYCL' by James Reinders, Ben Ashbaugh, James Brodman, Michael Kinsner, John Pennycook, Xin…
Dire Wolf is a software "soundcard" AX.25 packet modem/TNC and APRS encoder/decoder. It can be used stand-alone to observe APRS traffic, as a tracker, digipeater, APRStt gateway, or Internet Gatewa…
The Compute Library is a set of computer vision and machine learning functions optimised for both Arm CPUs and GPUs using SIMD technologies.
A curated list of projects related to the reMarkable tablet
Make huge neural nets fit in memory
Deprecated - see our other repos for Bazel examples
For publishing the source for UG1352 "Get Moving with Alveo"
The code for the ebook Ray Tracing in One Weekend by Peter Shirley translated to CUDA by Roger Allen. This work is in the public domain.
Brevitas: neural network quantization in PyTorch
Performance writing to GPIO with CPU and DMA on the Raspberry Pi
NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
Documentation of NVIDIA chip/hardware interfaces
Source code examples from the Parallel Forall Blog