Highlights
- Pro
Stars
Github mirror of trition-lang/triton repo.
Virtual whiteboard for sketching hand-drawn like diagrams
Neighborhood Attention Transformer, arxiv 2022 / CVPR 2023. Dilated Neighborhood Attention Transformer, arxiv 2022
TiledCUDA is a highly efficient kernel template library designed to elevate CUDA C’s level of abstraction for processing tiles.
NVIDIA Resiliency Extension is a python package for framework developers and users to implement fault-tolerant features. It improves the effective training time by minimizing the downtime due to fa…
NVIDIA Linux open GPU with P2P support
veRL: Volcano Engine Reinforcement Learning for LLM
xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism
OpenR: An Open Source Framework for Advanced Reasoning with Large Language Models
Collection of AWESOME vision-language models for vision tasks
This is originally a collection of papers on neural network accelerators. Now it's more like my selection of research on deep learning and computer architecture.
A native PyTorch Library for large model training
[OSDI'24] Serving LLM-based Applications Efficiently with Semantic Variable
Applied AI experiments and examples for PyTorch
SGLang is a fast serving framework for large language models and vision language models.
A throughput-oriented high-performance serving framework for LLMs
A fast communication-overlapping library for tensor parallelism on GPUs.
A low-latency & high-throughput serving engine for LLMs
MAGIS: Memory Optimization via Coordinated Graph Transformation and Scheduling for DNN (ASPLOS'24)
A list of tutorials, paper, talks, and open-source projects for emerging compiler and architecture
Code for the paper "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models".
S-LoRA: Serving Thousands of Concurrent LoRA Adapters
Mirage: Automatically Generating Fast GPU Kernels without Programming in Triton/CUDA
scalable and robust tree-based speculative decoding algorithm
[COLM 2024] TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding
A "large" language model running on a microcontroller