Skip to content
View lee-bin's full-sized avatar
🏠
Working
🏠
Working

Organizations

@XiaoMi

Block or report lee-bin

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

A playbook for systematically maximizing the performance of deep learning models.

27,671 2,287 Updated Jun 18, 2024

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…

C++ 9,028 1,043 Updated Dec 26, 2024

A parser, editor and profiler tool for ONNX models.

Python 409 56 Updated Nov 20, 2024

Accessible large language models via k-bit quantization for PyTorch.

Python 6,463 644 Updated Dec 23, 2024

QLoRA: Efficient Finetuning of Quantized LLMs

Jupyter Notebook 10,137 826 Updated Jun 10, 2024

Transformer related optimization, including BERT, GPT

C++ 5,957 897 Updated Mar 27, 2024

Universal LLM Deployment Engine with ML Compilation

Python 19,499 1,602 Updated Dec 19, 2024
Python 196 58 Updated Mar 28, 2023

《Machine Learning Systems: Design and Implementation》- Chinese Version

TeX 4,152 440 Updated Apr 13, 2024

AISystem 主要是指AI系统,包括AI芯片、AI编译器、AI推理和训练框架等AI全栈底层技术

Jupyter Notebook 11,737 1,697 Updated Dec 7, 2024
Python 596 66 Updated Jun 4, 2024

A technical report on convolution arithmetic in the context of deep learning

TeX 14,134 2,292 Updated Jun 8, 2023

Production First and Production Ready End-to-End Speech Recognition Toolkit

Python 4,245 1,095 Updated Dec 27, 2024

A tutorial for getting started with the Deep Learning Accelerator (DLA) on NVIDIA Jetson

Python 303 30 Updated May 19, 2022

GAN-based Mel-Spectrogram Inversion Network for Text-to-Speech Synthesis

Python 989 215 Updated Aug 28, 2023

HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis

Python 2,002 512 Updated Jul 27, 2024

C/C++ Performance Profiler

C++ 4,245 350 Updated Dec 26, 2024

🚀 Awesome System for Machine Learning ⚡️ AI System Papers and Industry Practice. ⚡️ System for Machine Learning, LLM (Large Language Model), GenAI (Generative AI). 🍻 OSDI, NSDI, SIGCOMM, SoCC, MLSy…

2,724 313 Updated Aug 14, 2024

SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime

Python 2,282 258 Updated Dec 29, 2024

Code for our paper at ECCV 2020: Post-Training Piecewise Linear Quantization for Deep Neural Networks

Python 68 13 Updated Nov 4, 2021

10x faster matrix and vector operations

C++ 2,480 170 Updated Oct 12, 2022

CPU INFOrmation library (x86/x86-64/ARM/ARM64, Linux/Windows/Android/macOS/iOS)

C 1,032 328 Updated Dec 9, 2024

High-efficiency floating-point neural network inference operators for mobile, server, and Web

C 1,914 376 Updated Dec 29, 2024

Open standard for machine learning interoperability

Python 18,161 3,692 Updated Dec 29, 2024

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Python 85,362 22,983 Updated Dec 30, 2024

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator

C++ 15,134 2,974 Updated Dec 30, 2024

Application Binary Interface for the Arm® Architecture

HTML 980 194 Updated Dec 13, 2024

A cross platform C99 library to get cpu features at runtime.

C++ 2,476 266 Updated Dec 18, 2024

stb single-file public domain libraries for C/C++

C 27,382 7,733 Updated Nov 9, 2024
C++ 305 85 Updated Dec 20, 2024
Next