Skip to content
View seb-sep's full-sized avatar

Highlights

  • Pro

Block or report seb-sep

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

Learning how to write "Less Slow" code in C++ 20, C 99, CUDA, PTX, & Assembly, from numerics & SIMD to coroutines, ranges, exception handling, networking and user-space IO

C++ 450 33 Updated Feb 27, 2025

Tile primitives for speedy kernels

Cuda 2,088 119 Updated Mar 1, 2025

Fast low-bit matmul kernels in Triton

Python 250 19 Updated Feb 24, 2025

An open-source invisible desktop application to help you pass your technical interviews.

TypeScript 2,235 300 Updated Feb 27, 2025

lsblk in go for apple computers

Go 7 Updated Nov 3, 2024

A categorized list of C++ resources.

4,794 497 Updated Mar 1, 2025

Meta Lingua: a lean, efficient, and easy-to-hack codebase to research LLMs.

Python 4,449 243 Updated Feb 20, 2025

CUDA on non-NVIDIA GPUs

Rust 10,803 695 Updated Feb 24, 2025

SPIRV-Cross is a practical tool and library for performing reflection on SPIR-V and disassembling SPIR-V back to high level languages.

GLSL 2,151 579 Updated Feb 18, 2025

Any model. Any hardware. Zero compromise. Built with @ziglang / @openxla / MLIR / @bazelbuild

Zig 2,104 76 Updated Feb 27, 2025

Whisper with Medusa heads

Python 823 51 Updated Feb 26, 2025

A high-performance, zero-overhead, extensible Python compiler with built-in NumPy support

Python 15,436 526 Updated Feb 27, 2025

convert images, video to ascii!

Zig 316 11 Updated Feb 19, 2025

FlashAttention (Metal Port)

Swift 439 19 Updated Sep 22, 2024

Everything we actually know about the Apple Neural Engine (ANE)

2,160 77 Updated Sep 23, 2024

Apple GPU microarchitecture

Metal 500 23 Updated Sep 22, 2024

LLM101n: Let's build a Storyteller

32,039 1,739 Updated Aug 1, 2024

Efficient Triton Kernels for LLM Training

Python 4,523 274 Updated Mar 1, 2025

LLM training in simple, raw C/Metal Shading Language

Cuda 47 2 Updated Apr 24, 2024

🚀 Efficient implementations of state-of-the-art linear attention models in Torch and Triton

Python 2,023 124 Updated Mar 1, 2025

Hackable and optimized Transformers building blocks, supporting a composable construction.

Python 9,114 649 Updated Feb 28, 2025

Dolphin for iOS, reborn

C++ 321 39 Updated Jan 17, 2025

A port of https://www.github.com/n64decomp/sm64 for modern devices.

C 1,054 162 Updated Nov 15, 2024

ONNX Serving is a project written with C++ to serve onnx-mlir compiled models with GRPC and other protocols.Benefiting from C++ implementation, ONNX Serving has very low latency overhead and high t…

C++ 22 3 Updated Oct 24, 2023

A Super Mario 64 decompilation, brought to you by a bunch of clever folks.

C 7,923 1,412 Updated Feb 4, 2024

Mirage: Automatically Generating Fast GPU Kernels without Programming in Triton/CUDA

C++ 752 45 Updated Mar 1, 2025

seqax = sequence modeling + JAX

Python 145 12 Updated Feb 26, 2025

Official implementation of Half-Quadratic Quantization (HQQ)

Python 760 76 Updated Feb 24, 2025

A CocoaPods plugin to add SPM dependencies to CocoaPods-based projects

Ruby 67 12 Updated Feb 27, 2025

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Python 37,130 4,270 Updated Mar 1, 2025
Next
Showing results