Skip to content

Summary of system papers/frameworks/codes/tools on training or serving large model

License

Notifications You must be signed in to change notification settings

taiqzheng/awesome-lm-system

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 

Repository files navigation

Awesome Large Model (LM) System Awesome

This repo collects papers, repos, tools for large model system, including training, inference, serving and compression.

Papers

Training

Year Publisher Title Framework
2023 QLoRA: Efficient Finetuning of Quantized LLMs
2023 Understanding INT4 Quantization for Transformer Models: Latency Speedup, Composability, and Failure Cases DeepSpeed
2023 ICLR DySR: Adaptive Super-Resolution via Algorithm and System Co-design DeepSpeed
2023 Scaling Vision-Language Models with Sparse Mixture of Experts DeepSpeed
2023 IPDPS MCR-DL: Mix-and-Match Communication Runtime for Deep Learning DeepSpeed
2023 ICS A Hybrid Tensor-Expert-Data Parallelism Approach to Optimize Mixture-of-Experts Training DeepSpeed
2023 OSDI AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving Alpa
2023 MLSys On Optimizing the Communication of Model Parallelism Alpa
2023 Colossal-Auto: Unified Automation of Parallelization and Activation Checkpoint for Large-scale Models ColossalAI
2022 Reducing Activation Recomputation in Large Transformer Models Megatron-LM
2022 HiPC 1-bit LAMB: Communication Efficient Large-Scale Large-Batch Training with LAMB's Convergence Speed DeepSpeed
2022 NeurIPS The Stability-Efficiency Dilemma: Investigating Sequence Length Warmup for Training GPT Models DeepSpeed
2022 Maximizing Communication Efficiency for Large-scale Training via 0/1 Adam DeepSpeed
2022 ICML DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale DeepSpeed
2022 Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model DeepSpeed
2022 NeuraIPS Extreme Compression for Pre-trained Transformers Made Simple and Efficient DeepSpeed
2022 NeuraIPS ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers DeepSpeed
2022 Random-LTD: Random and Layerwise Token Dropping Brings Efficient Training for Large-scale Transformers DeepSpeed
2022 DeepSpeed Data Efficiency: Improving Deep Learning Model Quality and Training Efficiency via Efficient Data Sampling and Routing DeepSpeed
2022 OSDI Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning Alpa
2022 ICPP Tesseract: Parallelize the Tensor Parallelism Efficiently ColossalAI
2022 A Frequency-aware Software Cache for Large Recommendation System Embeddings ColossalAI
2022 TPDS Parallel Training of Pre-Trained Models via Chunk-Based Dynamic Memory Management ColossalAI
2021 Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM Megatron-LM
2021 LoRA: Low-Rank Adaptation of Large Language Models
2021 SC ZeRO-Infinity: Breaking the GPU Memory Wall for Extreme Scale Deep Learning DeepSpeed
2021 ICML 1-bit Adam: Communication Efficient Large-Scale Training with Adam's Convergence Speed DeepSpeed
2021 ATC ZeRO-Offload: Democratizing Billion-Scale Model Training. DeepSpeed
2021 PPoPP DAPPLE: a pipelined data parallel approach for training large models
2021 ICML TeraPipe: Token-Level Pipeline Parallelism for Training Large TeraPipe
2021 ICML Memory-Efficient Pipeline-Parallel DNN Training PipeDream
2021 An Efficient 2D Method for Training Super-Large Deep Learning Models ColossalAI
2021 Maximizing Parallelism in Distributed Training for Huge Neural Networks ColossalAI
2021 Sequence Parallelism: Long Sequence Training from System Perspective ColossalAI
2021 Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training ColossalAI
2020 KDD Tutorial DeepSpeed: System Optimizations Enable Training Deep Learning Models with Over 100 Billion Parameters. DeepSpeed
2020 SC ZeRO: memory optimizations toward training trillion parameter models. DeepSpeed
2020 NeuraIPS Accelerating Training of Transformer-Based Language Models with Progressive Layer Dropping DeepSpeed
2020 Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism Megatron-LM
2020 torchgpipe: On-the-fly Pipeline Parallelism for Training Giant Models TorchGpipe
2019 NeuraIPS GPipe: efficient training of giant neural networks using pipeline parallelism TorchGpipe
2019 SOSP PipeDream: Generalized pipeline parallelism for DNN training PipeDream

Inference

Year Publisher Title Framework
2023 EnergonAI: An Inference System for 10-100 Billion Parameter Transformer Models EnergonAI
2022 ICML DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale DeepSpeed
2022 SC DeepSpeed Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale DeepSpeed

Benchmark

Year Publisher Title Framework
Year Pub Title Framework
Year Pub Title1 Framework

Survey

Year Publisher Title Framework
Year Pub Title Framework
Year Pub Title1 Framework

Frameworks

Year Name Training Inference Serving Comments
2023 EnergonAI
2022 Alpa Compilation based mixed parallelism
2021 Megatron-DeepSpeed Add MoE model training, Curriculum Learning, 3D Parallelism from DeepSpeed to Megatron
2021 TeraPipe
2021 ColossalAI
2021 FasterTransformer
2020 DeepSpeed General Support of Transformers and MoE with 3d-parallelism
2019 Megatron-LM
2019 PipeDream
2019 TorchGipe The torchgipe has been merged to PyTorch in 2020.
2019 PipeDream

About

Summary of system papers/frameworks/codes/tools on training or serving large model

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published