Skip to content

A bidirectional pipeline parallelism algorithm for computation-communication overlap in V3/R1 training.

License

Notifications You must be signed in to change notification settings

Telsho/DualPipe

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DualPipe

DualPipe is an innovative bidirectional pipeline parallelism algorithm introduced in the DeepSeek-V3 Technical Report. It achieves full overlap of forward and backward computation-communication phases, also reducing pipeline bubbles. For detailed information on computation-communication overlap, please refer to the profile data.

Schedules

schedules

Example DualPipe scheduling for 8 PP ranks and 20 micro-batches in two directions. The micro-batches in the reverse direction are symmetric to those in the forward direction, so we omit their batch ID for illustration simplicity. Two cells enclosed by a shared black border have mutually overlapped computation and communication

Pipeline Bubbles and Memory Usage Comparison

Method Bubble Parameter Activation
1F1B (PP-1)(𝐹+𝐵) PP
ZB1P (PP-1)(𝐹+𝐵-2𝑊) PP
DualPipe (PP/2-1)(𝐹&𝐵+𝐵-3𝑊) PP+1

𝐹 denotes the execution time of a forward chunk, 𝐵 denotes the execution time of a full backward chunk, 𝑊 denotes the execution time of a "backward for weights" chunk, and 𝐹&𝐵 denotes the execution time of two mutually overlapped forward and backward chunks.

Quick Start

The usage is shown in the following example:

python example.py

Note: For real-world applications, you will need to implement a custom overlapped_forward_backward method tailored to your specific module.

Requirements

  • PyTorch 2.0 and above

Developers

DualPipe was created and developed by Jiashi Li and Chengqi Deng and Wenfeng Liang.

Citation

@misc{deepseekai2024deepseekv3technicalreport,
      title={DeepSeek-V3 Technical Report}, 
      author={DeepSeek-AI},
      year={2024},
      eprint={2412.19437},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2412.19437}, 
}

About

A bidirectional pipeline parallelism algorithm for computation-communication overlap in V3/R1 training.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%