mkolod

Follow

Marek Kolodziej mkolod

Follow

156 followers · 9 following

@google
San Francisco Bay Area, CA

Achievements

Achievements

Stars

mkolod / fast_upsampling

Cuda 33 2 Updated Jul 19, 2024

karpathy / nanoGPT

The simplest, fastest repository for training/finetuning medium-sized GPTs.

Python 39,823 6,528 Updated Dec 9, 2024

NVIDIA / TransformerEngine

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilizatio…

Python 2,237 370 Updated Mar 4, 2025

google / psp

C 254 24 Updated May 26, 2024

iree-org / iree

A retargetable MLIR-based machine learning compiler and runtime toolkit.

C++ 3,010 669 Updated Mar 4, 2025

bazel-contrib / rules_cuda

Starlark implementation of bazel rules for CUDA.

Starlark 97 47 Updated Mar 4, 2025

cloudhan / rules_cuda_examples

This repo holds the extended examples for rules_cuda.

Starlark 7 1 Updated Jul 24, 2023

traveller59 / spconv

Spatial Sparse Convolution Library

Python 1,971 370 Updated Dec 15, 2024

NVIDIA / MinkowskiEngine

Minkowski Engine is an auto-diff neural network library for high-dimensional sparse tensors

Python 2,585 375 Updated Mar 5, 2024

Apress / data-parallel-CPP

Source code for 'Data Parallel C++: Mastering DPC++ for Programming of Heterogeneous Systems using C++ and SYCL' by James Reinders, Ben Ashbaugh, James Brodman, Michael Kinsner, John Pennycook, Xin…

CMake 264 86 Updated Jan 6, 2025

wb2osz / direwolf

Dire Wolf is a software "soundcard" AX.25 packet modem/TNC and APRS encoder/decoder. It can be used stand-alone to observe APRS traffic, as a tracker, digipeater, APRStt gateway, or Internet Gatewa…

C 1,655 311 Updated Oct 29, 2024

km4ack / pi-scripts

Various scripts written for ham radio pi

Shell 111 40 Updated Jan 13, 2025

km4ack / pi-build

Shell 339 70 Updated Apr 18, 2024

ARM-software / ComputeLibrary

The Compute Library is a set of computer vision and machine learning functions optimised for both Arm CPUs and GPUs using SIMD technologies.

C++ 2,925 787 Updated Feb 17, 2025

reHackable / awesome-reMarkable

A curated list of projects related to the reMarkable tablet

6,544 221 Updated Feb 21, 2025

cybertronai / gradient-checkpointing

Make huge neural nets fit in memory

Python 2,767 272 Updated Apr 26, 2020

OasisDigital / bazel-examples

Deprecated - see our other repos for Bazel examples

Java 10 5 Updated Mar 22, 2022

Xilinx / Get_Moving_With_Alveo

For publishing the source for UG1352 "Get Moving with Alveo"

C++ 50 16 Updated Jun 17, 2020

icgrp / ese532_code

C++ 9 17 Updated Oct 22, 2024

Xilinx / Vitis_Accel_Examples

Vitis_Accel_Examples

Makefile 529 216 Updated Feb 20, 2025

Xilinx / Vitis-Tutorials

Vitis In-Depth Tutorials

C 1,327 560 Updated Mar 4, 2025

Xilinx / Vitis-AI-Tutorials

427 150 Updated Jun 12, 2024

Avnet / bdf

Avnet Board Definition Files

Tcl 131 68 Updated Jan 9, 2025

rogerallen / raytracinginoneweekendincuda

Forked from pfranz/raytracinginoneweekend

The code for the ebook Ray Tracing in One Weekend by Peter Shirley translated to CUDA by Roger Allen. This work is in the public domain.

C++ 345 87 Updated Jan 29, 2021

Xilinx / brevitas

Brevitas: neural network quantization in PyTorch

Python 1,263 206 Updated Mar 4, 2025

hzeller / rpi-gpio-dma-demo

Performance writing to GPIO with CPU and DMA on the Raspberry Pi

C 201 27 Updated Jul 15, 2024

NVIDIA / TensorRT

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.

C++ 11,274 2,160 Updated Feb 1, 2025

NVIDIA / open-gpu-doc

Documentation of NVIDIA chip/hardware interfaces

C 1,268 92 Updated Sep 10, 2024

sandeepkumar-skb / PyTorch_TRT_Experiments

Python 2 1 Updated Feb 8, 2023

NVIDIA-developer-blog / code-samples

Source code examples from the Parallel Forall Blog

HTML 1,265 638 Updated Jul 23, 2024