irasin

Follow

Sin irasin

Follow

everything for fun

33 followers · 7 following

Achievements

Achievements

Lists (3)

Sort

🔮 Future ideas

✨ Inspiration

🚀 My stack

Stars

tempdban / docs

11 5 Updated Apr 27, 2013

owenliang / qwen-vllm

通义千问VLLM推理部署DEMO

Python 510 74 Updated Mar 28, 2024

codeplaysoftware / portBLAS

Archived implementation of BLAS using the SYCL open standard. See oneMath for a replacement.

C++ 263 50 Updated Jan 13, 2025

gpgpu-sim / gpgpu-sim_distribution

GPGPU-Sim provides a detailed simulation model of contemporary NVIDIA GPUs running CUDA and/or OpenCL workloads. It includes support for features such as TensorCores and CUDA Dynamic Parallelism as…

C++ 1,221 530 Updated Feb 12, 2025

TiledTensor / TiledCUDA

We invite you to visit and follow our new repository at https://github.com/microsoft/TileFusion. TiledCUDA is a highly efficient kernel template library designed to elevate CUDA C’s level of abstra…

C++ 175 11 Updated Jan 28, 2025

tip-of-the-week / cpp

C++ Tip Of The Week

Python 1,595 74 Updated Nov 25, 2024

KnowingNothing / MatmulTutorial

A Easy-to-understand TensorOp Matmul Tutorial

C++ 316 35 Updated Sep 21, 2024

HazyResearch / ThunderKittens

Tile primitives for speedy kernels

Cuda 2,021 111 Updated Feb 14, 2025

ilqvya / random

Random for modern C++ with convenient API

C++ 915 82 Updated Feb 9, 2025

boost-ext / ut

C++20 μ(micro)/Unit Testing Framework

C++ 1,303 126 Updated Feb 11, 2025

adam-maj / tiny-gpu

A minimal GPU design in Verilog to learn how GPUs work from the ground up

SystemVerilog 7,836 594 Updated Aug 18, 2024

lizhe2004 / Awesome-LLM-RAG-Application

the resources about the application based on LLM with RAG pattern

1,082 61 Updated Jan 22, 2025

blackinkkkxi / RAG_langchain

一个基于langchain实现RAG的简单示例

Jupyter Notebook 382 61 Updated Jan 2, 2025

ray-project / llm-applications

A comprehensive guide to building RAG-based LLM applications for production.

Jupyter Notebook 1,760 240 Updated Aug 2, 2024

MARD1NO / CUDA-PPT

80 12 Updated Jun 26, 2023

nicolaswilde / cuda-tensorcore-hgemm

Cuda 129 24 Updated Dec 26, 2024

Cjkkkk / CUDA_gemm

A simple high performance CUDA GEMM implementation.

Cuda 346 39 Updated Jan 4, 2024

AyakaGEMM / Hands-on-GEMM

Cuda 108 15 Updated Mar 18, 2024

DefTruth / CUDA-Learn-Notes

📚200+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).

Cuda 2,301 247 Updated Feb 7, 2025

NVIDIA / nvbandwidth

A tool for bandwidth measurements on NVIDIA GPUs.

C++ 355 31 Updated Feb 7, 2025

NVIDIA / online-softmax

Benchmark code for the "Online normalizer calculation for softmax" paper

Cuda 66 7 Updated Jul 27, 2018

RRZE-HPC / gpu-benches

collection of benchmarks to measure basic GPU capabilities

C++ 290 44 Updated Feb 11, 2025

franneck94 / CppProjectTemplate

C++ project template with unit-tests, documentation, ci-testing and workflows.

CMake 249 95 Updated Jul 15, 2024

wmmae / wmma_extension

An extension library of WMMA API (Tensor Core API)

Cuda 87 14 Updated Jul 12, 2024

banach-space / clang-tutor

A collection of out-of-tree Clang plugins for teaching and learning

C++ 720 64 Updated Nov 24, 2024

Textualize / rich

Rich is a Python library for rich text and beautiful formatting in the terminal.

Python 50,727 1,781 Updated Dec 2, 2024

wzsh / wmma_tensorcore_sample

Matrix Multiply-Accumulate with CUDA and WMMA( Tensor Core)

Cuda 124 19 Updated Aug 18, 2020

Bruce-Lee-LY / cuda_hgemm

Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.

Cuda 342 70 Updated Sep 8, 2024

hunterzju / llvm-tutorial

llvm-tutorial文档，翻译以及代码仓库

C++ 162 25 Updated Oct 9, 2023

xiaolincoder / CS-Base

图解计算机网络、操作系统、计算机组成、数据库，共 1000 张图 + 50 万字，破除晦涩难懂的计算机基础知识，让天下没有难懂的八股文！🚀 在线阅读：https://xiaolincoding.com

15,247 1,910 Updated Nov 27, 2024