seb-sep

Sebastian Sepulveda seb-sep

CS student at Northeastern University

10 followers · 11 following

in/sebastianmsepulveda

Achievements

Highlights

Starred repositories

ashvardanian / less_slow.cpp

Learning how to write "Less Slow" code in C++ 20, C 99, CUDA, PTX, & Assembly, from numerics & SIMD to coroutines, ranges, exception handling, networking and user-space IO

C++ 450 33 Updated Feb 27, 2025

HazyResearch / ThunderKittens

Tile primitives for speedy kernels

Cuda 2,088 119 Updated Mar 1, 2025

mobiusml / gemlite

Fast low-bit matmul kernels in Triton

Python 250 19 Updated Feb 24, 2025

ibttf / interview-coder

An open-source invisible desktop application to help you pass your technical interviews.

TypeScript 2,235 300 Updated Feb 27, 2025

JakeTrock / gosblk

lsblk in go for apple computers

Go 7 Updated Nov 3, 2024

MattPD / cpplinks

A categorized list of C++ resources.

4,794 497 Updated Mar 1, 2025

facebookresearch / lingua

Meta Lingua: a lean, efficient, and easy-to-hack codebase to research LLMs.

Python 4,449 243 Updated Feb 20, 2025

vosen / ZLUDA

CUDA on non-NVIDIA GPUs

Rust 10,803 695 Updated Feb 24, 2025

KhronosGroup / SPIRV-Cross

SPIRV-Cross is a practical tool and library for performing reflection on SPIR-V and disassembling SPIR-V back to high level languages.

GLSL 2,151 579 Updated Feb 18, 2025

zml / zml

Any model. Any hardware. Zero compromise. Built with @ziglang / @openxla / MLIR / @bazelbuild

Zig 2,104 76 Updated Feb 27, 2025

aiola-lab / whisper-medusa

Whisper with Medusa heads

Python 823 51 Updated Feb 26, 2025

exaloop / codon

A high-performance, zero-overhead, extensible Python compiler with built-in NumPy support

Python 15,436 526 Updated Feb 27, 2025

seatedro / asciigen

convert images, video to ascii!

Zig 316 11 Updated Feb 19, 2025

philipturner / metal-flash-attention

FlashAttention (Metal Port)

Swift 439 19 Updated Sep 22, 2024

hollance / neural-engine

Everything we actually know about the Apple Neural Engine (ANE)

2,160 77 Updated Sep 23, 2024

philipturner / metal-benchmarks

Apple GPU microarchitecture

Metal 500 23 Updated Sep 22, 2024

karpathy / LLM101n

LLM101n: Let's build a Storyteller

32,039 1,739 Updated Aug 1, 2024

linkedin / Liger-Kernel

Efficient Triton Kernels for LLM Training

Python 4,523 274 Updated Mar 1, 2025

regrettable-username / llm.metal

Forked from karpathy/llm.c

LLM training in simple, raw C/Metal Shading Language

Cuda 47 2 Updated Apr 24, 2024

fla-org / flash-linear-attention

🚀 Efficient implementations of state-of-the-art linear attention models in Torch and Triton

Python 2,023 124 Updated Mar 1, 2025

facebookresearch / xformers

Hackable and optimized Transformers building blocks, supporting a composable construction.

Python 9,114 649 Updated Feb 28, 2025

OatmealDome / dolphin-ios

Dolphin for iOS, reborn

C++ 321 39 Updated Jan 17, 2025

sm64-port / sm64-port

Forked from n64decomp/sm64

A port of https://www.github.com/n64decomp/sm64 for modern devices.

C 1,054 162 Updated Nov 15, 2024

IBM / onnx-mlir-serving

ONNX Serving is a project written with C++ to serve onnx-mlir compiled models with GRPC and other protocols.Benefiting from C++ implementation, ONNX Serving has very low latency overhead and high t…

C++ 22 3 Updated Oct 24, 2023