Skip to content
View HeyDavid633's full-sized avatar

Highlights

  • Pro

Block or report HeyDavid633

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Artifacts of EVT ASPLOS'24

Python 22 2 Updated Mar 6, 2024

FlagGems is an operator library for large language models implemented in Triton Language.

Python 381 55 Updated Dec 27, 2024

Memory-efficient multi layer perceptron implementation in OpenAI Triton.

Python 5 Updated Jul 10, 2023

Helpful tools and examples for working with flex-attention

Python 554 28 Updated Dec 13, 2024

[NeurIPS'24 Spotlight] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an A100 whil…

Python 853 39 Updated Dec 28, 2024

A Subscribe Convert Tool for Clash

Python 4 1 Updated Dec 24, 2024
C++ 28 14 Updated Dec 13, 2024

Samples for CUDA Developers which demonstrates features in CUDA Toolkit

C 6,659 1,878 Updated Jul 26, 2024

C++ extensions in PyTorch

Python 1,036 217 Updated Aug 7, 2024

how to optimize some algorithm in cuda.

Cuda 1,778 148 Updated Dec 28, 2024

compiler learning resources collect.

Python 2,217 339 Updated May 27, 2024

real Transformer TeraFLOPS on various GPUs

Jupyter Notebook 885 110 Updated Jan 9, 2024

Development repository for the Triton language and compiler

C++ 13,838 1,688 Updated Dec 29, 2024

A local chatbot fine-tuned by bilibili user comments.

Python 3,148 365 Updated May 15, 2024

A self-learning tutorail for CUDA High Performance Programing.

JavaScript 2 1 Updated Nov 11, 2024

高性能计算相关知识学习笔记,包含学习笔记和相关知识的代码demo,在持续完善中。 如果有帮助的话请Star一下,对作者帮助很大,谢谢!

Jupyter Notebook 390 35 Updated Mar 28, 2023

Compiler Infrastructure for Neural Networks

C++ 144 114 Updated Jul 18, 2023

A Easy-to-understand TensorOp Matmul Tutorial

C++ 301 32 Updated Sep 21, 2024

Practice on cifar100(ResNet, DenseNet, VGG, GoogleNet, InceptionV3, InceptionV4, Inception-ResNetv2, Xception, Resnet In Resnet, ResNext,ShuffleNet, ShuffleNetv2, MobileNet, MobileNetv2, SqueezeNet…

Python 4,353 1,182 Updated Jul 15, 2024

My solutions to the Glomers Challenge: a series of distributed systems challenges.

Go 113 7 Updated Mar 1, 2023

Materials for the Learn PyTorch for Deep Learning: Zero to Mastery course.

Jupyter Notebook 11,510 3,353 Updated Sep 12, 2024

A Zsh theme

Shell 47,162 2,214 Updated Dec 29, 2024

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 32,760 4,990 Updated Dec 29, 2024

Abstraction your words——never mind the scandal and liber

Python 186 8 Updated Mar 25, 2020

一个纯前端的抽象话转换器

HTML 422 33 Updated Dec 2, 2019

Fast and memory-efficient exact attention

Python 14,823 1,397 Updated Dec 29, 2024

Fast Synchronization-Free Algorithms for Parallel Sparse Triangular Solves with Multiple Right-Hand Sides (SpTRSM)

C 12 4 Updated Feb 14, 2020

AISystem 主要是指AI系统,包括AI芯片、AI编译器、AI推理和训练框架等AI全栈底层技术

Jupyter Notebook 11,732 1,696 Updated Dec 7, 2024

Transformer related optimization, including BERT, GPT

C++ 5,957 897 Updated Mar 27, 2024
Next