Skip to content
View KoalaYuFeng's full-sized avatar
  • National University of Singapore
  • Singapore
  • X @feng_seu

Highlights

  • Pro

Organizations

@Xtra-Computing

Block or report KoalaYuFeng

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Heterogeneous Accelerated Computed Cluster (HACC) Resources Page

20 26 Updated Jan 23, 2025
Python 272 277 Updated Jan 23, 2025

A framework for few-shot evaluation of language models.

Python 7,621 2,049 Updated Jan 31, 2025

x-dpu project

1 Updated Dec 6, 2024
C++ 4 Updated Jun 7, 2024

Official Pytorch repository for Extreme Compression of Large Language Models via Additive Quantization https://arxiv.org/pdf/2401.06118.pdf and PV-Tuning: Beyond Straight-Through Estimation for Ext…

Python 1,210 183 Updated Dec 26, 2024

VPTQ, A Flexible and Extreme low-bit quantization algorithm

Python 574 39 Updated Jan 21, 2025
Python 35 6 Updated Oct 8, 2024

An acceleration library that supports arbitrary bit-width combinatorial quantization operations

C++ 213 21 Updated Sep 30, 2024

Python library for data stream learning

Python 28 Updated Sep 11, 2024

RapidStream TAPA compiles task-parallel HLS program into high-frequency FPGA accelerators.

C++ 163 34 Updated Feb 2, 2025

Low-bit LLM inference on CPU with lookup table

C++ 659 49 Updated Jan 9, 2025

Mamba SSM architecture

Python 13,872 1,193 Updated Jan 18, 2025

SGLang is a fast serving framework for large language models and vision language models.

Python 8,321 814 Updated Feb 2, 2025

[MLSys'24] Atom: Low-bit Quantization for Efficient and Accurate LLM Serving

Cuda 292 25 Updated Jul 2, 2024

LLM inference in C/C++

C++ 72,732 10,479 Updated Feb 2, 2025

AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:

Python 1,920 235 Updated Jan 20, 2025

Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.

Cuda 340 70 Updated Sep 8, 2024

collection of benchmarks to measure basic GPU capabilities

C++ 287 43 Updated Feb 1, 2025

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Python 36,481 4,223 Updated Feb 2, 2025

High-speed Large Language Model Serving for Local Deployment

C++ 8,071 418 Updated Jan 28, 2025

Llama中文社区,Llama3在线体验和微调模型已开放,实时汇总最新Llama3学习资料,已将所有代码更新适配Llama3,构建最好的中文Llama大模型,完全开源可商用

Python 14,385 1,287 Updated Sep 5, 2024

Analyze the inference of Large Language Models (LLMs). Analyze aspects like computation, storage, transmission, and hardware roofline model in a user-friendly interface.

Python 384 46 Updated Sep 11, 2024

A collection of extensions for Vitis and Intel FPGA OpenCL to improve developer quality of life.

C++ 312 58 Updated Jan 20, 2025

Implementation for MatMul-free LM.

Python 2,957 187 Updated Nov 5, 2024

A library for efficient similarity search and clustering of dense vectors.

C++ 32,679 3,724 Updated Jan 31, 2025

《开源大模型食用指南》针对中国宝宝量身打造的基于Linux环境快速微调(全参数/Lora)、部署国内外开源大模型(LLM)/多模态大模型(MLLM)教程

Jupyter Notebook 11,845 1,347 Updated Feb 2, 2025
Next