Skip to content
View svtdanny's full-sized avatar
  • Yandex Ads
  • Moscow, Russia

Block or report svtdanny

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

10x faster matrix and vector operations

C++ 2,479 170 Updated Oct 12, 2022

Accelerating Diffusion Transformers with Token-wise Feature Caching

Python 32 1 Updated Nov 6, 2024

Efficient AI Inference & Serving

Python 461 26 Updated Jan 8, 2024

A Python library transfers PyTorch tensors between CPU and NVMe

C++ 100 19 Updated Nov 27, 2024

✈️ Accelerating Vision Diffusion Transformers with Skip Branches.

Python 51 Updated Dec 12, 2024

A very simple and barebones tensor decomposition library for CP decomposition a.k.a. PARAFAC a.k.a. TCA

Python 163 66 Updated Jan 10, 2024

TensorLy: Tensor Learning in Python.

Python 1,576 289 Updated Dec 16, 2024

Stateful load balancer custom-tailored for llama.cpp 🏓🦙

Rust 650 27 Updated Dec 7, 2024

Making large AI models cheaper, faster and more accessible

Python 38,919 4,349 Updated Dec 17, 2024

Open-Sora: Democratizing Efficient Video Production for All

Python 22,671 2,225 Updated Nov 28, 2024

jax-triton contains integrations between JAX and OpenAI Triton

Python 361 40 Updated Dec 18, 2024

Zero-copy MPI communication of JAX arrays, for turbo-charged HPC applications in Python ⚡

Python 453 30 Updated Dec 18, 2024

RUDOLPH: One Hyper-Tasking Transformer can be creative as DALL-E and GPT-3 and smart as CLIP

Jupyter Notebook 255 29 Updated Feb 6, 2023

Tensor library for machine learning

C++ 11,383 1,062 Updated Dec 18, 2024

Efficient LLM Inference over Long Sequences

Python 313 12 Updated Dec 6, 2024

PyTorch bindings for CUTLASS grouped GEMM for MoE.

Cuda 4 Updated May 12, 2024

PyTorch bindings for CUTLASS grouped GEMM.

Cuda 74 26 Updated Jul 18, 2024

PyTorch bindings for CUTLASS grouped GEMM.

Cuda 56 40 Updated Oct 31, 2024

A tool for bandwidth measurements on NVIDIA GPUs.

C++ 333 30 Updated Oct 18, 2024

Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more

Python 30,788 2,842 Updated Dec 18, 2024

Library for reading and processing ML training data.

Python 338 23 Updated Dec 17, 2024

a language for fast, portable data-parallel computation

C++ 5,928 1,073 Updated Dec 18, 2024

Best practices & guides on how to write distributed pytorch training code

Python 317 20 Updated Dec 16, 2024

A native PyTorch Library for large model training

Python 2,769 224 Updated Dec 18, 2024

A Pytorch implementation of Sparsely-Gated Mixture of Experts, for massively increasing the parameter count of language models

Python 657 50 Updated Sep 13, 2023

PyTorch library for cost-effective, fast and easy serving of MoE models.

Python 108 8 Updated Dec 9, 2024

Development repository for the Triton language and compiler

C++ 13,721 1,683 Updated Dec 18, 2024

Build system, successor to Buck

Rust 3,637 233 Updated Dec 18, 2024
Next