Skip to content
View dreaming-panda's full-sized avatar

Block or report dreaming-panda

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
17 results for source starred repositories
Clear filter

FlashInfer Nightly

6 2 Updated Dec 24, 2024

MagicPIG: LSH Sampling for Efficient LLM Generation

Python 160 8 Updated Dec 16, 2024

Soft-QMIX: Integrating Maximum Entropy For Monotonic Value Function Factorization

Python 11 Updated Jul 3, 2024

Sirius, an efficient correction mechanism, which significantly boosts Contextual Sparsity models on reasoning tasks while maintaining its efficiency gain.

Python 20 4 Updated Sep 10, 2024

C++ extensions in PyTorch

Python 1,034 216 Updated Aug 7, 2024

FlashInfer: Kernel Library for LLM Serving

Cuda 1,617 161 Updated Dec 24, 2024

FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/

C++ 1,226 520 Updated Dec 25, 2024

Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.

Python 5,722 521 Updated Dec 14, 2024

[COLM 2024] TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding

Python 237 13 Updated Aug 31, 2024

scalable and robust tree-based speculative decoding algorithm

Python 322 37 Updated Aug 13, 2024

Codebase for "SLIDE : In Defense of Smart Algorithms over Hardware Acceleration for Large-Scale Deep Learning Systems"

C++ 1,092 169 Updated Apr 13, 2021

MSCCL++: A GPU-driven communication stack for scalable AI applications

C++ 267 43 Updated Dec 23, 2024

Microsoft Collective Communication Library

C++ 327 30 Updated Sep 20, 2023

Automatically Discovering Fast Parallelization Strategies for Distributed Deep Neural Network Training

C++ 1,740 233 Updated Dec 25, 2024
Python 193 30 Updated Dec 25, 2023