Stars
FpgaNIC is an FPGA-based Versatile 100Gb SmartNIC for GPUs [ATC 22]
Implementation of a Tensor Processing Unit for embedded systems and the IoT.
Libraries for applying sparsification recipes to neural networks with a few lines of code, enabling faster and smaller models
A pytorch implementation of dorefa quantization
A reading list for deep graph learning acceleration.
PyTorch implementation of DeltaLSTM and Column-Balanced Targeted Dropout
Github Pages template for academic personal websites, forked from mmistakes/minimal-mistakes
An unnecessarily tiny implementation of GPT-2 in NumPy.
The code for our ICCAD work "Optimized Data Reuse via Reordering for Sparse Matrix-Vector Multiplication on FPGAs"
This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows".
A high-level performance analysis tool for FPGA-based accelerators
Matplotlib styles for scientific plotting
Java inefficiency detection tool based on CPU performance monitoring counters and hardware debug register. The tool detects dead writes, silent stores, and redundant loads.
This is a mips simulator I wrote once to help my understanding of pipelines, branch prediction, assembly language, and more.