Dazz993

Zhanda Zhu Dazz993

UofT. MLSys.

84 followers · 56 following

Univeristy of Toronto
Toronto, CA
02:20 (UTC -05:00)
zhandazhu.com

Highlights

Lists (1)

Sort

🔮 Future ideas

3 repositories

Stars

Dazz993 / Mist

[EuroSys'25] Mist: Efficient Distributed Training of Large Language Models via Memory-Parallelism Co-Optimization

Python 1 Updated Feb 7, 2025

ashishpatel26 / LLM-Finetuning

LLM Finetuning with peft

Jupyter Notebook 2,334 636 Updated Feb 18, 2025

seal-rg / recurrent-pretraining

Pretraining code for a large-scale depth-recurrent language model

Python 608 50 Updated Feb 14, 2025

Zanette-Labs / SpeculativeRejection

[NeurIPS 2024] Fast Best-of-N Decoding via Speculative Rejection

Python 39 4 Updated Oct 29, 2024

gpu-mode / lectures

Material for gpu-mode lectures

Jupyter Notebook 3,775 382 Updated Feb 9, 2025

google / iopddl

Supplemental materials for The ASPLOS 2025 / EuroSys 2025 Contest on Intra-Operator Parallelism for Distributed Deep Learning

C++ 23 1 Updated Dec 12, 2024

asplos-contest / 2025

The ASPLOS 2025 / EuroSys 2025 Contest Track

27 2 Updated Feb 22, 2025

codelion / optillm

Optimizing inference proxy for LLMs

Python 2,051 159 Updated Feb 23, 2025

ekinakyurek / marc

Public repository for "The Surprising Effectiveness of Test-Time Training for Abstract Reasoning"

Python 294 27 Updated Nov 19, 2024

bytedance / ShadowKV

ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference

Python 150 6 Updated Oct 30, 2024

Yu-Maryland / Differentiable_Scheduler_ICML24

Differentiable Combinatorial Scheduling at Scale (ICML'24). Mingju Liu, Yingjie Li, Jiaqi Yin, Zhiru Zhang, Cunxi Yu.

Python 19 1 Updated Oct 31, 2024

ScalingIntelligence / Archon

Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.

Python 162 12 Updated Feb 12, 2025

pprp / Awesome-LLM-Prune

Awesome list for LLM pruning.

203 8 Updated Dec 15, 2024

shawntan / scattermoe

Triton-based implementation of Sparse Mixture of Experts.

Python 197 16 Updated Nov 28, 2024

mirage-project / mirage

Mirage: Automatically Generating Fast GPU Kernels without Programming in Triton/CUDA

C++ 747 44 Updated Feb 21, 2025

arpita8 / Awesome-Mixture-of-Experts-Papers

Survey: A collection of AWESOME papers and resources on the latest research in Mixture of Experts.

102 1 Updated Aug 21, 2024

Mellanox / nccl-rdma-sharp-plugins

RDMA and SHARP plugins for nccl library

C 176 34 Updated Jan 22, 2025

CentML / flexible-inference-bench

A modular, extensible LLM inference benchmarking framework that supports multiple benchmarking frameworks and paradigms.

Python 8 Updated Feb 21, 2025

xdit-project / xDiT

xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism

Python 1,284 104 Updated Feb 10, 2025

deepseek-ai / DeepSeek-V2

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

4,779 486 Updated Sep 25, 2024

microsoft / graphrag

A modular graph-based Retrieval-Augmented Generation (RAG) system

Python 22,701 2,257 Updated Feb 21, 2025

NVIDIA / cuda-python

CUDA Python: Performance meets Productivity

Python 1,106 92 Updated Feb 22, 2025

bytedance / flux

A fast communication-overlapping library for tensor parallelism on GPUs.

C++ 297 25 Updated Oct 30, 2024

DefTruth / Awesome-Diffusion-Inference

📖A curated list of Awesome Diffusion Inference Papers with codes: Sampling, Caching, Multi-GPUs, etc. 🎉🎉

191 12 Updated Jan 16, 2025

ColfaxResearch / cutlass-kernels

Cuda 181 30 Updated Jul 11, 2024

microsoft / mscclpp

MSCCL++: A GPU-driven communication stack for scalable AI applications

C++ 298 45 Updated Feb 22, 2025

CosmosShadow / gptpdf

Using GPT to parse PDF

Python 3,250 233 Updated Aug 7, 2024

microsoft / NPKit

NCCL Profiling Kit

Python 127 12 Updated Jul 1, 2024

kvcache-ai / Mooncake

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 2,611 160 Updated Feb 23, 2025

actions / actions-runner-controller

Kubernetes controller for GitHub Actions self-hosted runners

Go 4,976 1,170 Updated Feb 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly