llm
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…
Get up and running with Llama 3.3, DeepSeek-R1, Phi-4, Gemma 2, and other large language models.
The Triton TensorRT-LLM Backend
A high-throughput and memory-efficient inference and serving engine for LLMs
Large Language Model Text Generation Inference
🦜🔗 Build context-aware reasoning applications
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
Powering AWS purpose-built machine learning chips. Blazing fast and cost effective, natively integrated into PyTorch and TensorFlow and integrated with your favorite AWS services
A unified library of state-of-the-art model optimization techniques such as quantization, pruning, distillation, speculative decoding, etc. It compresses deep learning models for downstream deploym…
SGLang is a fast serving framework for large language models and vision language models.