-
UCAS
- BeiJing
-
01:05
(UTC +08:00)
Highlights
- Pro
Stars
A library for efficient similarity search and clustering of dense vectors.
A library that provides an embeddable, persistent key-value store for fast storage.
Productive, portable, and performant GPU programming in Python.
Development repository for the Triton language and compiler
Open-source simulator for autonomous driving research.
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…
Ethereum miner with OpenCL, CUDA and stratum support
HIP: C++ Heterogeneous-Compute Interface for Portability
Fast inference engine for Transformer models
Optimized primitives for collective multi-GPU communication
A machine learning compiler for GPUs, CPUs, and ML accelerators
Automatically Discovering Fast Parallelization Strategies for Distributed Deep Neural Network Training
a fast and user-friendly runtime for transformer inference (Bert, Albert, GPT2, Decoders, etc) on CPU and GPU.
A flexible and efficient deep neural network (DNN) compiler that generates high-performance executable from a DNN model description.
A polyhedral compiler for expressing fast and portable data parallel algorithms
A fast GPU memory copy library based on NVIDIA GPUDirect RDMA technology
a software library containing FFT functions written in OpenCL
Repository for nvCOMP docs and examples. nvCOMP is a library for fast lossless compression/decompression on the GPU that can be downloaded from https://developer.nvidia.com/nvcomp.
optimized BERT transformer inference on NVIDIA GPU. https://arxiv.org/abs/2210.03052
ROCm Platform Runtime: ROCr a HPC market enhanced HSA based runtime
GPU-scheduler-for-deep-learning
CLRadeonExtender (GCN assembler, Radeon assembler) mirror