Skip to content
View ethnzhng's full-sized avatar
🪼
🪼
  • San Francisco Bay Area

Block or report ethnzhng

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Stars

llm

11 repositories

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Python 140,590 28,187 Updated Mar 4, 2025

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…

C++ 9,603 1,126 Updated Mar 4, 2025

Get up and running with Llama 3.3, DeepSeek-R1, Phi-4, Gemma 2, and other large language models.

Go 130,945 10,737 Updated Mar 4, 2025

The Triton TensorRT-LLM Backend

Python 793 115 Updated Mar 4, 2025

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 40,200 6,021 Updated Mar 4, 2025

Large Language Model Text Generation Inference

Python 9,840 1,155 Updated Mar 4, 2025

🦜🔗 Build context-aware reasoning applications

Jupyter Notebook 102,231 16,570 Updated Mar 4, 2025

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

Python 13,243 2,717 Updated Mar 4, 2025

Powering AWS purpose-built machine learning chips. Blazing fast and cost effective, natively integrated into PyTorch and TensorFlow and integrated with your favorite AWS services

Python 498 158 Updated Feb 6, 2025

A unified library of state-of-the-art model optimization techniques such as quantization, pruning, distillation, speculative decoding, etc. It compresses deep learning models for downstream deploym…

Python 759 55 Updated Mar 3, 2025

SGLang is a fast serving framework for large language models and vision language models.

Python 11,322 1,135 Updated Mar 4, 2025