Skip to content
View Hzfengsy's full-sized avatar

Highlights

  • Pro

Organizations

@apache @cityflow-project @tlc-pack @mlc-ai

Block or report Hzfengsy

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results
C++ 21 5 Updated Jan 9, 2025

SGLang is a fast serving framework for large language models and vision language models.

Python 7,247 694 Updated Jan 12, 2025

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 2,345 134 Updated Jan 10, 2025

veRL: Volcano Engine Reinforcement Learning for LLM

Python 641 50 Updated Jan 12, 2025

A collection of useful .gitignore templates

163,620 83,111 Updated Jan 9, 2025

Rigourous evaluation of LLM-synthesized code - NeurIPS 2023 & COLM 2024

Python 1,323 116 Updated Jan 6, 2025

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 33,579 5,132 Updated Jan 12, 2025
Python 1,127 41 Updated Nov 21, 2024

A PyTorch native library for large model training

Python 3,031 240 Updated Jan 10, 2025

Super-Efficient RLHF Training of LLMs with Parameter Reallocation

Python 191 10 Updated Dec 30, 2024

A fast communication-overlapping library for tensor parallelism on GPUs.

C++ 268 24 Updated Oct 30, 2024

Ongoing research training transformer models at scale

Python 11,072 2,475 Updated Jan 12, 2025

FlagGems is an operator library for large language models implemented in Triton Language.

Python 392 58 Updated Jan 11, 2025

《明日方舟》长草助手

Python 524 54 Updated Jan 6, 2025

Development repository for the Triton-Linalg conversion

C++ 167 15 Updated Dec 25, 2024

Apple GPU microarchitecture

Metal 489 20 Updated Sep 22, 2024

MLX: An array framework for Apple silicon

C++ 18,276 1,053 Updated Jan 12, 2025

BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.

Python 494 37 Updated Jan 11, 2025

Awesome LLM compression research papers and tools.

1,311 86 Updated Jan 10, 2025

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Python 137,470 27,527 Updated Jan 11, 2025

Development repository for the Triton language and compiler

C++ 13,982 1,701 Updated Jan 12, 2025

An extention of TVMScript to write simple and high performance GPU kernels with tensorcore.

Python 51 2 Updated Jul 23, 2024

Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.

Python 5,753 523 Updated Dec 14, 2024

FlashInfer: Kernel Library for LLM Serving

Cuda 1,766 177 Updated Jan 9, 2025

The documents for TVM Unity

Shell 11 2 Updated Aug 9, 2024

A cross-platform ChatGPT/Gemini UI (Web / PWA / Linux / Win / MacOS). 一键拥有你自己的跨平台 ChatGPT/Gemini/Claude LLM 应用。

TypeScript 78,503 60,018 Updated Jan 12, 2025

A tool which profiles OpenCL devices to find their peak capacities

C++ 423 118 Updated Dec 24, 2024
Python 116 13 Updated Apr 22, 2024

A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.

Python 2,796 222 Updated Sep 30, 2023
Next