Practice on cifar100(ResNet, DenseNet, VGG, GoogleNet, InceptionV3, InceptionV4, Inception-ResNetv2, Xception, Resnet In Resnet, ResNext,ShuffleNet, ShuffleNetv2, MobileNet, MobileNetv2, SqueezeNet…

Python 4,332 1,179 Updated Jul 15, 2024

intel / pepc

Pepc - Power, Energy, and Performance Configurator

Python 29 8 Updated Nov 29, 2024

getianao / ngAP

ngAP's artifact for ASPLOS'24

C++ 19 Updated Dec 10, 2024

vietnh1009 / ASCII-generator

ASCII generator (image to text, image to image, video to video)

Python 7,421 570 Updated Nov 22, 2024

NVIDIA / cuda-python

CUDA Python Low-level Bindings

Python 995 81 Updated Dec 12, 2024

cyjseagull / gem5-nvmain-hybrid-simulator

gem5-nvmain hybrid simulator supporting simulation of DRAM-NVM hybrid memory system

C++ 74 49 Updated Jul 23, 2019

cyjseagull / SHMA

SHMA: Software-managed Caching for Hybrid DRAM/NVM Memory Architectures, implemented with zsim and nvmain hybrid simulators

C++ 60 34 Updated Aug 26, 2017

AutomataLab / Tigr

Transforming Graphs for Efficient Irregular Graph Processing on GPUs

Cuda 47 16 Updated Nov 15, 2022

SET-Scheduling-Project / GEMINI-HPCA2024

Open-source Framework for HPCA2024 paper: Gemini: Mapping and Architecture Co-exploration for Large-scale DNN Chiplet Accelerators

C++ 59 10 Updated Aug 31, 2024

VIA-Research / vTrain

Python 40 7 Updated Sep 26, 2024

PrincetonUniversity / LLMCompass

Python 92 24 Updated Jul 1, 2024

lchangxii / sampled-mgpu-sim

Go 2 Updated Aug 29, 2023

intel / intel-extension-for-pytorch

A Python package for extending the official PyTorch that can easily obtain performance on Intel platform

Python 1,645 254 Updated Dec 12, 2024

TUM-DSE / CVM_eval

Evaluation code for confidential virtual machines (AMD SEV-SNP / Intel TDX)

Python 4 1 Updated Dec 11, 2024

csl-iisc / SUV-MICRO24

C++ 4 1 Updated Oct 6, 2024

mayshin10 / GPGPU_Sim-Enabled-Turing-WMMA-API

GPGPU-Sim enabled Turing WMMA API and its benchmark results. Undergraduate study at Yonsei Univ.

C++ 9 6 Updated Feb 21, 2021

likejazz / llama3.cuda

llama3.cuda is a pure C/CUDA implementation for Llama 3 model.

Cuda 317 21 Updated Jun 4, 2024

ggerganov / llama.cpp

LLM inference in C/C++

C++ 69,142 9,930 Updated Dec 12, 2024

joydddd / Toleo

C++ 4 1 Updated Nov 10, 2024

NVIDIA / TensorRT

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.

C++ 10,923 2,141 Updated Dec 5, 2024

narger-ef / LowMemoryFHEResNet20

Source code for the paper "Encrypted Image Classification with Low Memory Footprint using Fully Homomorphic Encryption"

Jupyter Notebook 43 13 Updated Jul 20, 2024

BUAA-CI-LAB / Literatures-on-Homomorphic-Encryption

A reading list for homomorphic encryption

106 7 Updated Aug 1, 2024

NVIDIA / DALI

A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.

C++ 5,195 622 Updated Dec 12, 2024