Introduction || What is DDP || Single-Node Multi-GPU Training || Fault Tolerance || Multi-Node training || minGPT Training

What is Distributed Data Parallel (DDP)

Authors: Suraj Subramanian

.. grid:: 2

   .. grid-item-card:: :octicon:`mortar-board;1em;` What you will learn
      :class-card: card-prerequisites

      *  How DDP works under the hood
      *  What is ``DistributedSampler``
      *  How gradients are synchronized across GPUs


   .. grid-item-card:: :octicon:`list-unordered;1em;` Prerequisites
      :class-card: card-prerequisites

      * Familiarity with `basic non-distributed training  <https://pytorch.org/tutorials/beginner/basics/quickstart_tutorial.html>`__ in PyTorch

Follow along with the video below or on youtube.

This tutorial is a gentle introduction to PyTorch DistributedDataParallel (DDP) which enables data parallel training in PyTorch. Data parallelism is a way to process multiple data batches across multiple devices simultaneously to achieve better performance. In PyTorch, the DistributedSampler ensures each device gets a non-overlapping input batch. The model is replicated on all the devices; each replica calculates gradients and simultaneously synchronizes with the others using the ring all-reduce algorithm.

This illustrative tutorial provides a more in-depth python view of the mechanics of DDP.

Why you should prefer DDP over `DataParallel` (DP)

DataParallel is an older approach to data parallelism. DP is trivially simple (with just one extra line of code) but it is much less performant. DDP improves upon the architecture in a few ways:

`DataParallel`	`DistributedDataParallel`
More overhead; model is replicated and destroyed at each forward pass	Model is replicated only once
Only supports single-node parallelism	Supports scaling to multiple machines
Slower; uses multithreading on a single process and runs into Global Interpreter Lock (GIL) contention	Faster (no GIL contention) because it uses multiprocessing

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ddp_series_theory.rst

ddp_series_theory.rst

What is Distributed Data Parallel (DDP)

Why you should prefer DDP over `DataParallel` (DP)

Further Reading

Files

ddp_series_theory.rst

Latest commit

History

ddp_series_theory.rst

File metadata and controls

What is Distributed Data Parallel (DDP)

Why you should prefer DDP over DataParallel (DP)

Further Reading

Why you should prefer DDP over `DataParallel` (DP)