Skip to content
View Dazz993's full-sized avatar
  • Univeristy of Toronto
  • Toronto, CA
  • 11:53 (UTC -05:00)

Highlights

  • Pro

Block or report Dazz993

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
240 results for source starred repositories
Clear filter

[EuroSys'25] Mist: Efficient Distributed Training of Large Language Models via Memory-Parallelism Co-Optimization

Python 1 Updated Feb 7, 2025

LLM Finetuning with peft

Jupyter Notebook 2,336 636 Updated Feb 18, 2025

Pretraining code for a large-scale depth-recurrent language model

Python 610 50 Updated Feb 14, 2025

[NeurIPS 2024] Fast Best-of-N Decoding via Speculative Rejection

Python 39 4 Updated Oct 29, 2024

Material for gpu-mode lectures

Jupyter Notebook 3,784 382 Updated Feb 9, 2025

Supplemental materials for The ASPLOS 2025 / EuroSys 2025 Contest on Intra-Operator Parallelism for Distributed Deep Learning

C++ 23 1 Updated Dec 12, 2024

The ASPLOS 2025 / EuroSys 2025 Contest Track

27 2 Updated Feb 22, 2025

Optimizing inference proxy for LLMs

Python 2,052 159 Updated Feb 23, 2025

Public repository for "The Surprising Effectiveness of Test-Time Training for Abstract Reasoning"

Python 294 27 Updated Nov 19, 2024

ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference

Python 151 6 Updated Oct 30, 2024

Differentiable Combinatorial Scheduling at Scale (ICML'24). Mingju Liu, Yingjie Li, Jiaqi Yin, Zhiru Zhang, Cunxi Yu.

Python 19 1 Updated Oct 31, 2024

Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.

Python 162 12 Updated Feb 12, 2025

Awesome list for LLM pruning.

203 8 Updated Dec 15, 2024

Triton-based implementation of Sparse Mixture of Experts.

Python 197 16 Updated Nov 28, 2024

Mirage: Automatically Generating Fast GPU Kernels without Programming in Triton/CUDA

C++ 747 44 Updated Feb 21, 2025

Survey: A collection of AWESOME papers and resources on the latest research in Mixture of Experts.

102 1 Updated Aug 21, 2024

RDMA and SHARP plugins for nccl library

C 176 34 Updated Jan 22, 2025

A modular, extensible LLM inference benchmarking framework that supports multiple benchmarking frameworks and paradigms.

Python 8 Updated Feb 21, 2025

xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism

Python 1,285 104 Updated Feb 10, 2025

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

4,779 486 Updated Sep 25, 2024

A modular graph-based Retrieval-Augmented Generation (RAG) system

Python 22,714 2,257 Updated Feb 21, 2025

CUDA Python: Performance meets Productivity

Python 1,107 92 Updated Feb 22, 2025

A fast communication-overlapping library for tensor parallelism on GPUs.

C++ 298 25 Updated Oct 30, 2024

📖A curated list of Awesome Diffusion Inference Papers with codes: Sampling, Caching, Multi-GPUs, etc. 🎉🎉

193 12 Updated Jan 16, 2025

MSCCL++: A GPU-driven communication stack for scalable AI applications

C++ 298 45 Updated Feb 22, 2025

Using GPT to parse PDF

Python 3,252 233 Updated Aug 7, 2024

NCCL Profiling Kit

Python 127 12 Updated Jul 1, 2024

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 2,615 161 Updated Feb 23, 2025

Kubernetes controller for GitHub Actions self-hosted runners

Go 4,977 1,171 Updated Feb 21, 2025
Next