Dazz993

Zhanda Zhu Dazz993

UofT. MLSys.

84 followers · 56 following

Univeristy of Toronto
Toronto, CA
11:34 (UTC -05:00)
zhandazhu.com

Highlights

Lists (1)

Sort

🔮 Future ideas

3 repositories

Stars

30 stars written in C++

Clear filter

ml-explore / mlx

MLX: An array framework for Apple silicon

C++ 19,237 1,096 Updated Feb 23, 2025

NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…

C++ 9,494 1,112 Updated Feb 21, 2025

SJTU-IPADS / PowerInfer

High-speed Large Language Model Serving for Local Deployment

C++ 8,114 424 Updated Feb 19, 2025

NVIDIA / cutlass

CUDA Templates for Linear Algebra Subroutines

C++ 6,429 1,087 Updated Feb 21, 2025

NVIDIA / FasterTransformer

Transformer related optimization, including BERT, GPT

C++ 6,037 899 Updated Mar 27, 2024

halide / Halide

a language for fast, portable data-parallel computation

C++ 5,972 1,078 Updated Feb 20, 2025

NVlabs / tiny-cuda-nn

Lightning fast C++/CUDA neural network framework

C++ 3,887 479 Updated Jan 27, 2025

openxla / xla

A machine learning compiler for GPUs, CPUs, and ML accelerators

C++ 2,972 509 Updated Feb 23, 2025

kvcache-ai / Mooncake

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 2,615 161 Updated Feb 23, 2025

tensor-compiler / taco

The Tensor Algebra Compiler (taco) computes sparse tensor expressions on CPUs and GPUs

C++ 1,279 191 Updated Apr 14, 2024

gpgpu-sim / gpgpu-sim_distribution

GPGPU-Sim provides a detailed simulation model of contemporary NVIDIA GPUs running CUDA and/or OpenCL workloads. It includes support for features such as TensorCores and CUDA Dynamic Parallelism as…

C++ 1,228 535 Updated Feb 15, 2025

NVIDIA / gdrcopy

A fast GPU memory copy library based on NVIDIA GPUDirect RDMA technology

C++ 959 149 Updated Feb 18, 2025

onnx / onnx-mlir

Representation and Reference Lowering of ONNX Models in MLIR Compiler Infrastructure

C++ 816 337 Updated Feb 20, 2025

mirage-project / mirage

Mirage: Automatically Generating Fast GPU Kernels without Programming in Triton/CUDA

C++ 747 44 Updated Feb 21, 2025

stdrc / modern-cmake-by-example

IPADS 实验室新人培训第二讲：CMake（2021.11.3）

C++ 623 83 Updated Feb 16, 2025

Antares: an automatic engine for multi-platform kernel generation and optimization. Supporting CPU, CUDA, ROCm, DirectX12, GraphCore, SYCL for CPU/GPU, OpenCL for AMD/NVIDIA, Android CPU/GPU backends.

C++ 456 48 Updated Feb 19, 2025