Stars
Hi-Speed DNN Training with Espresso: Unleashing the Full Potential of Gradient Compression with Near-Optimal Usage Strategies (EuroSys '23)
Ongoing research training transformer models at scale
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
NVIDIA Data Center GPU Manager (DCGM) is a project for gathering telemetry and measuring the health of NVIDIA GPUs
[ICML 2022] Channel Importance Matters in Few-shot Image Classification
Code for "Adaptive Gradient Quantization for Data-Parallel SGD", published in NeurIPS 2020.
Sparsified SGD with Memory: https://arxiv.org/abs/1809.07599
Ok-Topk is a scheme for distributed training with sparse gradients. Ok-Topk integrates a novel sparse allreduce algorithm (less than 6k communication volume which is asymptotically optimal) with th…
Code for "On the Relation Between the Sharpest Directions of DNN Loss and the SGD Step Length", ICLR 2019
Code for reproducing experiments performed for Accoridon
Understanding Top-k Sparsification in Distributed Deep Learning
Pytorch implementation of cnn network
Practice on cifar100(ResNet, DenseNet, VGG, GoogleNet, InceptionV3, InceptionV4, Inception-ResNetv2, Xception, Resnet In Resnet, ResNext,ShuffleNet, ShuffleNetv2, MobileNet, MobileNetv2, SqueezeNet…
Practical low-rank gradient compression for distributed optimization: https://arxiv.org/abs/1905.13727
Rethinking gradient sparsification as total error minimization
SIDCo is An Efficient Statistical-based Gradient Compression Technique for Distributed Training Systems
A Tool for Automatic Parallelization of Deep Learning Training in Distributed Multi-GPU Environments.
Pressio is latin for compression. Libpressio is a C++ library with C compatible bindings to abstract between different lossless and lossy compressors and their configurations. It solves the problem…
A flexible package manager that supports multiple versions, configurations, platforms, and compilers.
Error-bounded Lossy Data Compressor (for floating-point/integer datasets)
A distributed SGD algorithm for Matrix Factorization using PySpark
Implementation of vector quantization algorithms, codes for Norm-Explicit Quantization: Improving Vector Quantization for Maximum Inner Product Search.
Network-Accelerated Distributed Deep Learning
A high performance and generic framework for distributed DNN training
Multi Model Server is a tool for serving neural net models for inference
SGD with compressed gradients and error-feedback: https://arxiv.org/abs/1901.09847
A quickstart and benchmark for pytorch distributed training.
Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.