Skip to content
View svtdanny's full-sized avatar
  • Yandex Ads
  • Moscow, Russia

Block or report svtdanny

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
321 results for source starred repositories
Clear filter

10x faster matrix and vector operations

C++ 2,479 170 Updated Oct 12, 2022

Accelerating Diffusion Transformers with Token-wise Feature Caching

Python 33 1 Updated Dec 20, 2024

Efficient AI Inference & Serving

Python 461 26 Updated Jan 8, 2024

A Python library transfers PyTorch tensors between CPU and NVMe

C++ 101 19 Updated Nov 27, 2024

✈️ Accelerating Vision Diffusion Transformers with Skip Branches.

Python 51 Updated Dec 12, 2024

A very simple and barebones tensor decomposition library for CP decomposition a.k.a. PARAFAC a.k.a. TCA

Python 163 66 Updated Jan 10, 2024

TensorLy: Tensor Learning in Python.

Python 1,577 290 Updated Dec 16, 2024

Stateful load balancer custom-tailored for llama.cpp 🏓🦙

Rust 651 27 Updated Dec 7, 2024

Making large AI models cheaper, faster and more accessible

Python 38,933 4,348 Updated Dec 17, 2024

Open-Sora: Democratizing Efficient Video Production for All

Python 22,707 2,230 Updated Dec 20, 2024

jax-triton contains integrations between JAX and OpenAI Triton

Python 361 41 Updated Dec 20, 2024

Zero-copy MPI communication of JAX arrays, for turbo-charged HPC applications in Python ⚡

Python 453 30 Updated Dec 18, 2024

RUDOLPH: One Hyper-Tasking Transformer can be creative as DALL-E and GPT-3 and smart as CLIP

Jupyter Notebook 255 29 Updated Feb 6, 2023

Tensor library for machine learning

C++ 11,405 1,067 Updated Dec 19, 2024

Efficient LLM Inference over Long Sequences

Python 322 14 Updated Dec 6, 2024

PyTorch bindings for CUTLASS grouped GEMM.

Cuda 57 41 Updated Oct 31, 2024

A tool for bandwidth measurements on NVIDIA GPUs.

C++ 335 30 Updated Oct 18, 2024

Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more

Python 30,818 2,844 Updated Dec 22, 2024

Library for reading and processing ML training data.

Python 340 24 Updated Dec 21, 2024

a language for fast, portable data-parallel computation

C++ 5,930 1,073 Updated Dec 20, 2024

Best practices & guides on how to write distributed pytorch training code

Python 318 21 Updated Dec 16, 2024

A native PyTorch Library for large model training

Python 2,776 227 Updated Dec 20, 2024

A Pytorch implementation of Sparsely-Gated Mixture of Experts, for massively increasing the parameter count of language models

Python 658 50 Updated Sep 13, 2023

PyTorch library for cost-effective, fast and easy serving of MoE models.

Python 108 8 Updated Dec 9, 2024

Development repository for the Triton language and compiler

C++ 13,749 1,684 Updated Dec 22, 2024

Build system, successor to Buck

Rust 3,639 233 Updated Dec 21, 2024

On-device AI across mobile, embedded and edge for PyTorch

C++ 2,318 403 Updated Dec 22, 2024

LiteRT is the new name for TensorFlow Lite (TFLite). While the name is new, it's still the same trusted, high-performance runtime for on-device AI, now with an expanded vision.

C++ 198 16 Updated Dec 21, 2024
Next