Skip to content
View LinkZyy's full-sized avatar
🎯
Focusing
🎯
Focusing
  • UCAS
  • BeiJing
  • 01:05 (UTC +08:00)

Highlights

  • Pro

Block or report LinkZyy

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
33 stars written in C++
Clear filter

LLM inference in C/C++

C++ 69,513 10,023 Updated Dec 20, 2024

A library for efficient similarity search and clustering of dense vectors.

C++ 32,021 3,678 Updated Dec 20, 2024

A library that provides an embeddable, persistent key-value store for fast storage.

C++ 28,834 6,362 Updated Dec 20, 2024

Productive, portable, and performant GPU programming in Python.

C++ 25,927 2,295 Updated Dec 20, 2024

Development repository for the Triton language and compiler

C++ 13,737 1,684 Updated Dec 20, 2024

Open-source simulator for autonomous driving research.

C++ 11,731 3,769 Updated Dec 20, 2024

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…

C++ 8,934 1,030 Updated Dec 17, 2024

Ethereum miner with OpenCL, CUDA and stratum support

C++ 5,976 2,288 Updated Nov 1, 2023

CUDA Templates for Linear Algebra Subroutines

C++ 5,851 1,010 Updated Dec 11, 2024

HIP: C++ Heterogeneous-Compute Interface for Portability

C++ 3,808 540 Updated Dec 20, 2024

Fast inference engine for Transformer models

C++ 3,473 309 Updated Dec 18, 2024

Optimized primitives for collective multi-GPU communication

C++ 3,316 837 Updated Sep 17, 2024

A machine learning compiler for GPUs, CPUs, and ML accelerators

C++ 2,789 453 Updated Dec 20, 2024

《金庸群侠传》c++复刻版,已完工

C++ 2,642 375 Updated Oct 2, 2024

Automatically Discovering Fast Parallelization Strategies for Distributed Deep Neural Network Training

C++ 1,738 234 Updated Dec 20, 2024

a fast and user-friendly runtime for transformer inference (Bert, Albert, GPT2, Decoders, etc) on CPU and GPU.

C++ 1,489 198 Updated Jun 12, 2023

XLS: Accelerated HW Synthesis

C++ 1,219 182 Updated Dec 20, 2024

A flexible and efficient deep neural network (DNN) compiler that generates high-performance executable from a DNN model description.

C++ 969 163 Updated Sep 19, 2024

A polyhedral compiler for expressing fast and portable data parallel algorithms

C++ 921 133 Updated Nov 20, 2024

A fast GPU memory copy library based on NVIDIA GPUDirect RDMA technology

C++ 909 146 Updated Dec 16, 2024

a software library containing FFT functions written in OpenCL

C++ 624 192 Updated Oct 5, 2022

Repository for nvCOMP docs and examples. nvCOMP is a library for fast lossless compression/decompression on the GPU that can be downloaded from https://developer.nvidia.com/nvcomp.

C++ 564 79 Updated Sep 11, 2024

optimized BERT transformer inference on NVIDIA GPU. https://arxiv.org/abs/2210.03052

C++ 463 37 Updated Mar 15, 2024

ROCm Platform Runtime: ROCr a HPC market enhanced HSA based runtime

C++ 229 111 Updated Dec 20, 2024

GPU-scheduler-for-deep-learning

C++ 200 34 Updated Nov 5, 2020

build scripts for ROCm

C++ 181 34 Updated Jan 11, 2024
C++ 110 51 Updated Dec 20, 2024

SDAccel Development Environment Tutorials

C++ 107 71 Updated Apr 8, 2020

CLRadeonExtender (GCN assembler, Radeon assembler) mirror

C++ 97 28 Updated Jun 15, 2021

Mitsuba time-of-flight renderer.

C++ 54 22 Updated Jul 2, 2024
Next