Hzfengsy

Siyuan Feng Hzfengsy

ML System & Compiler | ASF Member | PMC Member of Apache TVM

512 followers · 55 following

SJTU
Shanghai
https://syfeng.net

Achievements

x3 x3 x3 x3

Achievements

x3 x3 x3 x3

Highlights

Organizations

Lists (3)

Sort

Stars

mlc-ai / mlc-python

C++ 21 5 Updated Jan 9, 2025

sgl-project / sglang

SGLang is a fast serving framework for large language models and vision language models.

Python 7,247 694 Updated Jan 12, 2025

deepseek-ai / DeepSeek-V3

Python 18,407 1,467 Updated Jan 7, 2025

kvcache-ai / Mooncake

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 2,345 134 Updated Jan 10, 2025

volcengine / verl

veRL: Volcano Engine Reinforcement Learning for LLM

Python 641 50 Updated Jan 12, 2025

github / gitignore

A collection of useful .gitignore templates

163,620 83,111 Updated Jan 9, 2025

evalplus / evalplus

Rigourous evaluation of LLM-synthesized code - NeurIPS 2023 & COLM 2024

Python 1,323 116 Updated Jan 6, 2025

vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 33,579 5,132 Updated Jan 12, 2025

Open-Source-O1 / Open-O1

Python 1,127 41 Updated Nov 21, 2024

pytorch / torchtitan

A PyTorch native library for large model training

Python 3,031 240 Updated Jan 10, 2025

openpsi-project / ReaLHF

Super-Efficient RLHF Training of LLMs with Parameter Reallocation

Python 191 10 Updated Dec 30, 2024

bytedance / flux

A fast communication-overlapping library for tensor parallelism on GPUs.

C++ 268 24 Updated Oct 30, 2024

NVIDIA / Megatron-LM

Ongoing research training transformer models at scale

Python 11,072 2,475 Updated Jan 12, 2025

FlagOpen / FlagGems

FlagGems is an operator library for large language models implemented in Triton Language.

Python 392 58 Updated Jan 11, 2025

ArkMowers / arknights-mower

《明日方舟》长草助手

Python 524 54 Updated Jan 6, 2025

Cambricon / triton-linalg

Development repository for the Triton-Linalg conversion

C++ 167 15 Updated Dec 25, 2024

philipturner / metal-benchmarks

Apple GPU microarchitecture

Metal 489 20 Updated Sep 22, 2024

ml-explore / mlx

MLX: An array framework for Apple silicon

C++ 18,276 1,053 Updated Jan 12, 2025

microsoft / BitBLAS

BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.

Python 494 37 Updated Jan 11, 2025

HuangOwen / Awesome-LLM-Compression

Awesome LLM compression research papers and tools.

1,311 86 Updated Jan 10, 2025

huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Python 137,470 27,527 Updated Jan 11, 2025

triton-lang / triton

Development repository for the Triton language and compiler

C++ 13,982 1,701 Updated Jan 12, 2025

nox-410 / tvm.tl

An extention of TVMScript to write simple and high performance GPU kernels with tensorcore.

Python 51 2 Updated Jul 23, 2024

pytorch-labs / gpt-fast

Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.

Python 5,753 523 Updated Dec 14, 2024

flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving

Cuda 1,766 177 Updated Jan 9, 2025

mlc-ai / docs

The documents for TVM Unity

Shell 11 2 Updated Aug 9, 2024

ChatGPTNextWeb / ChatGPT-Next-Web

A cross-platform ChatGPT/Gemini UI (Web / PWA / Linux / Win / MacOS). 一键拥有你自己的跨平台 ChatGPT/Gemini/Claude LLM 应用。

TypeScript 78,503 60,018 Updated Jan 12, 2025

krrishnarraj / clpeak

A tool which profiles OpenCL devices to find their peak capacities

C++ 423 118 Updated Dec 24, 2024

mlc-ai / llm-perf-bench

Python 116 13 Updated Apr 22, 2024

turboderp / exllama

A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.

Python 2,796 222 Updated Sep 30, 2023

Siyuan Feng Hzfengsy

Highlights

Organizations

Lists (3)

🚀 My stack

📖 Research

🛠️ tools

Stars