TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…

C++ 8,934 1,030 Updated Dec 17, 2024

ethereum-mining / ethminer

Ethereum miner with OpenCL, CUDA and stratum support

C++ 5,976 2,288 Updated Nov 1, 2023

NVIDIA / cutlass

CUDA Templates for Linear Algebra Subroutines

C++ 5,851 1,010 Updated Dec 11, 2024

ROCm / HIP

HIP: C++ Heterogeneous-Compute Interface for Portability

C++ 3,808 540 Updated Dec 20, 2024

OpenNMT / CTranslate2

Fast inference engine for Transformer models

C++ 3,473 309 Updated Dec 18, 2024

NVIDIA / nccl

Optimized primitives for collective multi-GPU communication

C++ 3,316 837 Updated Sep 17, 2024

openxla / xla

A machine learning compiler for GPUs, CPUs, and ML accelerators

C++ 2,789 453 Updated Dec 20, 2024

scarsty / kys-cpp

《金庸群侠传》c++复刻版，已完工

C++ 2,642 375 Updated Oct 2, 2024

flexflow / flexflow-train

Automatically Discovering Fast Parallelization Strategies for Distributed Deep Neural Network Training

C++ 1,738 234 Updated Dec 20, 2024

Tencent / TurboTransformers

a fast and user-friendly runtime for transformer inference (Bert, Albert, GPT2, Decoders, etc) on CPU and GPU.

C++ 1,489 198 Updated Jun 12, 2023

google / xls

XLS: Accelerated HW Synthesis

C++ 1,219 182 Updated Dec 20, 2024

microsoft / nnfusion

A flexible and efficient deep neural network (DNN) compiler that generates high-performance executable from a DNN model description.

C++ 969 163 Updated Sep 19, 2024

Tiramisu-Compiler / tiramisu

A polyhedral compiler for expressing fast and portable data parallel algorithms

C++ 921 133 Updated Nov 20, 2024

NVIDIA / gdrcopy

A fast GPU memory copy library based on NVIDIA GPUDirect RDMA technology

C++ 909 146 Updated Dec 16, 2024

clMathLibraries / clFFT

a software library containing FFT functions written in OpenCL

C++ 624 192 Updated Oct 5, 2022

NVIDIA / nvcomp

Repository for nvCOMP docs and examples. nvCOMP is a library for fast lossless compression/decompression on the GPU that can be downloaded from https://developer.nvidia.com/nvcomp.

C++ 564 79 Updated Sep 11, 2024

bytedance / ByteTransformer

optimized BERT transformer inference on NVIDIA GPU. https://arxiv.org/abs/2210.03052

C++ 463 37 Updated Mar 15, 2024

ROCm / ROCR-Runtime

ROCm Platform Runtime: ROCr a HPC market enhanced HSA based runtime

C++ 229 111 Updated Dec 20, 2024

alibaba / GPU-scheduler-for-deep-learning

GPU-scheduler-for-deep-learning

C++ 200 34 Updated Nov 5, 2020

xuhuisheng / rocm-build

build scripts for ROCm

C++ 181 34 Updated Jan 11, 2024

ROCm / clr

C++ 110 51 Updated Dec 20, 2024

Xilinx / SDAccel-Tutorials

SDAccel Development Environment Tutorials

C++ 107 71 Updated Apr 8, 2020

CLRX / CLRX-mirror

CLRadeonExtender (GCN assembler, Radeon assembler) mirror

C++ 97 28 Updated Jun 15, 2021

cmu-ci-lab / MitsubaToFRenderer

Mitsuba time-of-flight renderer.

C++ 54 22 Updated Jul 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Yangyu Zhang LinkZyy

Highlights

Block or report LinkZyy

Stars

ggerganov / llama.cpp

facebookresearch / faiss

facebook / rocksdb

taichi-dev / taichi

triton-lang / triton

carla-simulator / carla

NVIDIA / TensorRT-LLM