Skip to content
View wm2012011492's full-sized avatar

Block or report wm2012011492

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/

C++ 1,217 515 Updated Dec 13, 2024

PyTorch bindings for CUTLASS grouped GEMM.

Cuda 73 26 Updated Jul 18, 2024

TensorRT Model Optimizer is a unified library of state-of-the-art model optimization techniques such as quantization, pruning, distillation, etc. It compresses deep learning models for downstream d…

Python 619 45 Updated Dec 4, 2024

The Triton TensorRT-LLM Backend

Python 724 109 Updated Dec 11, 2024

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…

C++ 8,884 1,025 Updated Dec 11, 2024

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.

Python 26,610 5,479 Updated Dec 12, 2024

Let us control diffusion models!

Python 30,894 2,776 Updated Feb 25, 2024

WebUI extension for ControlNet

Python 17,155 1,974 Updated Aug 12, 2024

AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.

Python 4,570 372 Updated Dec 4, 2024

Stable Diffusion web UI

Python 144,291 27,105 Updated Nov 28, 2024

Fast and memory-efficient exact attention

Python 14,609 1,369 Updated Dec 12, 2024

GLIDE: a diffusion-based text-conditional image synthesis model

Python 3,562 507 Updated Mar 8, 2024

[ARCHIVED] Cooperative primitives for CUDA C++. See https://github.com/NVIDIA/cccl

Cuda 1,687 449 Updated Oct 9, 2023

TensorRT Plugin Autogen Tool

Python 366 42 Updated Apr 7, 2023

Transformer related optimization, including BERT, GPT

C++ 5,925 895 Updated Mar 27, 2024

Extremely Fast End-to-End Deep Multi-Agent Reinforcement Learning Framework on a GPU (JMLR 2022)

Python 467 78 Updated Aug 2, 2024

a language for fast, portable data-parallel computation

C++ 5,926 1,072 Updated Dec 12, 2024

The Triton Inference Server provides an optimized cloud and edge inferencing solution.

C++ 1 Updated Jul 21, 2021

Production First and Production Ready End-to-End Speech Recognition Toolkit

Python 4,223 1,091 Updated Nov 8, 2024

Assembler for NVIDIA Volta and Turing GPUs

Python 203 40 Updated Jan 13, 2022

CUDA Templates for Linear Algebra Subroutines

C++ 5,815 1,004 Updated Dec 11, 2024

Simple samples for TensorRT programming

Python 1,535 342 Updated Nov 2, 2024

Implementation for PolarNet: An Improved Grid Representation for Online LiDAR Point Clouds Semantic Segmentation (CVPR 2020)

Python 382 80 Updated May 19, 2021

An Open Source Machine Learning Framework for Everyone

C++ 1,011 157 Updated Sep 25, 2024

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Python 136,202 27,285 Updated Dec 12, 2024

Source code examples from the Parallel Forall Blog

HTML 1,244 634 Updated Jul 23, 2024

The Triton Inference Server provides an optimized cloud and edge inferencing solution.

Python 8,442 1,490 Updated Dec 12, 2024

OpenAI Baselines: high-quality implementations of reinforcement learning algorithms

Python 15,881 4,881 Updated Aug 1, 2024

a fast and user-friendly runtime for transformer inference (Bert, Albert, GPT2, Decoders, etc) on CPU and GPU.

C++ 1,490 198 Updated Jun 12, 2023
Next