svtdanny

Follow

Sivtsov Danil svtdanny

Follow

Machine Learning and Software engineer

4 followers · 6 following

Yandex Ads
Moscow, Russia

Achievements

Achievements

Stars

321 results for source starred repositories

dblalock / bolt

10x faster matrix and vector operations

C++ 2,479 170 Updated Oct 12, 2022

huggingface / search-and-learn

Python 660 44 Updated Dec 18, 2024

Shenyi-Z / ToCa

Accelerating Diffusion Transformers with Token-wise Feature Caching

Python 33 1 Updated Dec 20, 2024

hpcaitech / SwiftInfer

Efficient AI Inference & Serving

Python 461 26 Updated Jan 8, 2024

hpcaitech / TensorNVMe

A Python library transfers PyTorch tensors between CPU and NVMe

C++ 101 19 Updated Nov 27, 2024

OpenSparseLLMs / Skip-DiT

✈️ Accelerating Vision Diffusion Transformers with Skip Branches.

Python 51 Updated Dec 12, 2024

neurostatslab / tensortools

A very simple and barebones tensor decomposition library for CP decomposition a.k.a. PARAFAC a.k.a. TCA

Python 163 66 Updated Jan 10, 2024

tensorly / tensorly

TensorLy: Tensor Learning in Python.

Python 1,577 290 Updated Dec 16, 2024

distantmagic / paddler

Stateful load balancer custom-tailored for llama.cpp 🏓🦙

Rust 651 27 Updated Dec 7, 2024

hpcaitech / ColossalAI

Making large AI models cheaper, faster and more accessible

Python 38,933 4,348 Updated Dec 17, 2024

hpcaitech / Open-Sora

Open-Sora: Democratizing Efficient Video Production for All

Python 22,707 2,230 Updated Dec 20, 2024

jax-ml / jax-triton

jax-triton contains integrations between JAX and OpenAI Triton

Python 361 41 Updated Dec 20, 2024

mpi4jax / mpi4jax

Zero-copy MPI communication of JAX arrays, for turbo-charged HPC applications in Python ⚡

Python 453 30 Updated Dec 18, 2024

ai-forever / ru-dolph

RUDOLPH: One Hyper-Tasking Transformer can be creative as DALL-E and GPT-3 and smart as CLIP

Jupyter Notebook 255 29 Updated Feb 6, 2023

ggerganov / ggml

Tensor library for machine learning

C++ 11,405 1,067 Updated Dec 19, 2024

NVIDIA / Star-Attention

Efficient LLM Inference over Long Sequences

Python 322 14 Updated Dec 6, 2024

tgale96 / grouped_gemm

PyTorch bindings for CUTLASS grouped GEMM.

Cuda 57 41 Updated Oct 31, 2024

NVIDIA / nvbandwidth

A tool for bandwidth measurements on NVIDIA GPUs.

C++ 335 30 Updated Oct 18, 2024

jax-ml / jax

Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more

Python 30,818 2,844 Updated Dec 22, 2024

google / grain

Library for reading and processing ML training data.

Python 340 24 Updated Dec 21, 2024

halide / Halide

a language for fast, portable data-parallel computation

C++ 5,930 1,073 Updated Dec 20, 2024

LambdaLabsML / distributed-training-guide

Best practices & guides on how to write distributed pytorch training code

Python 318 21 Updated Dec 16, 2024

pytorch / torchtitan

A native PyTorch Library for large model training

Python 2,776 227 Updated Dec 20, 2024

lucidrains / mixture-of-experts

A Pytorch implementation of Sparsely-Gated Mixture of Experts, for massively increasing the parameter count of language models

Python 658 50 Updated Sep 13, 2023

TorchMoE / MoE-Infinity

PyTorch library for cost-effective, fast and easy serving of MoE models.

Python 108 8 Updated Dec 9, 2024

huaweicodelabs / HiAI-Foundation

C++ 13 2 Updated Jul 6, 2022

triton-lang / triton

Development repository for the Triton language and compiler

C++ 13,749 1,684 Updated Dec 22, 2024

facebook / buck2

Build system, successor to Buck

Rust 3,639 233 Updated Dec 21, 2024

pytorch / executorch

On-device AI across mobile, embedded and edge for PyTorch

C++ 2,318 403 Updated Dec 22, 2024

google-ai-edge / LiteRT

LiteRT is the new name for TensorFlow Lite (TFLite). While the name is new, it's still the same trusted, high-performance runtime for on-device AI, now with an expanded vision.

C++ 198 16 Updated Dec 21, 2024