Skip to content
View irasin's full-sized avatar

Block or report irasin

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results
11 5 Updated Apr 27, 2013

通义千问VLLM推理部署DEMO

Python 510 74 Updated Mar 28, 2024

Archived implementation of BLAS using the SYCL open standard. See oneMath for a replacement.

C++ 263 50 Updated Jan 13, 2025

GPGPU-Sim provides a detailed simulation model of contemporary NVIDIA GPUs running CUDA and/or OpenCL workloads. It includes support for features such as TensorCores and CUDA Dynamic Parallelism as…

C++ 1,221 530 Updated Feb 12, 2025

We invite you to visit and follow our new repository at https://github.com/microsoft/TileFusion. TiledCUDA is a highly efficient kernel template library designed to elevate CUDA C’s level of abstra…

C++ 175 11 Updated Jan 28, 2025

C++ Tip Of The Week

Python 1,595 74 Updated Nov 25, 2024

A Easy-to-understand TensorOp Matmul Tutorial

C++ 316 35 Updated Sep 21, 2024

Tile primitives for speedy kernels

Cuda 2,021 111 Updated Feb 14, 2025

Random for modern C++ with convenient API

C++ 915 82 Updated Feb 9, 2025

C++20 μ(micro)/Unit Testing Framework

C++ 1,303 126 Updated Feb 11, 2025

A minimal GPU design in Verilog to learn how GPUs work from the ground up

SystemVerilog 7,836 594 Updated Aug 18, 2024

the resources about the application based on LLM with RAG pattern

1,082 61 Updated Jan 22, 2025

一个基于langchain实现RAG的简单示例

Jupyter Notebook 382 61 Updated Jan 2, 2025

A comprehensive guide to building RAG-based LLM applications for production.

Jupyter Notebook 1,760 240 Updated Aug 2, 2024
80 12 Updated Jun 26, 2023

A simple high performance CUDA GEMM implementation.

Cuda 346 39 Updated Jan 4, 2024
Cuda 108 15 Updated Mar 18, 2024

📚200+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).

Cuda 2,301 247 Updated Feb 7, 2025

A tool for bandwidth measurements on NVIDIA GPUs.

C++ 355 31 Updated Feb 7, 2025

Benchmark code for the "Online normalizer calculation for softmax" paper

Cuda 66 7 Updated Jul 27, 2018

collection of benchmarks to measure basic GPU capabilities

C++ 290 44 Updated Feb 11, 2025

C++ project template with unit-tests, documentation, ci-testing and workflows.

CMake 249 95 Updated Jul 15, 2024

An extension library of WMMA API (Tensor Core API)

Cuda 87 14 Updated Jul 12, 2024

A collection of out-of-tree Clang plugins for teaching and learning

C++ 720 64 Updated Nov 24, 2024

Rich is a Python library for rich text and beautiful formatting in the terminal.

Python 50,727 1,781 Updated Dec 2, 2024

Matrix Multiply-Accumulate with CUDA and WMMA( Tensor Core)

Cuda 124 19 Updated Aug 18, 2020

Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.

Cuda 342 70 Updated Sep 8, 2024

llvm-tutorial文档,翻译以及代码仓库

C++ 162 25 Updated Oct 9, 2023

图解计算机网络、操作系统、计算机组成、数据库,共 1000 张图 + 50 万字,破除晦涩难懂的计算机基础知识,让天下没有难懂的八股文!🚀 在线阅读:https://xiaolincoding.com

15,247 1,910 Updated Nov 27, 2024
Next