Skip to content

Commit

Permalink
+ Reorganize comparisons of network partitioning, layer-wise partitio…
Browse files Browse the repository at this point in the history
…ning, and data parallelism
  • Loading branch information
astonzhang committed Apr 21, 2021
1 parent 443be61 commit 5af4d1b
Show file tree
Hide file tree
Showing 2 changed files with 857 additions and 631 deletions.
6 changes: 2 additions & 4 deletions chapter_computational-performance/multiple-gpus.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,9 +80,7 @@ In general, the training proceeds as follows:

Note that in practice we *increase* the minibatch size $k$-fold when training on $k$ GPUs such that each GPU has the same amount of work to do as if we were training on a single GPU only. On a 16-GPU server this can increase the minibatch size considerably and we may have to increase the learning rate accordingly.
Also note that batch normalization in :numref:`sec_batch_norm` needs to be adjusted, e.g., by keeping a separate batch normalization coefficient per GPU.
In what follows we will use LeNet in :numref:`sec_lenet` as the toy network to illustrate multi-GPU training.


In what follows we will use a toy network to illustrate multi-GPU training.

```{.python .input}
%matplotlib inline
Expand All @@ -102,7 +100,7 @@ from torch.nn import functional as F

## A Toy Network

We use LeNet as introduced in :numref:`sec_lenet`. We define it from scratch to illustrate parameter exchange and synchronization in detail.
We use LeNet as introduced in :numref:`sec_lenet` with slight modifications. We define it from scratch to illustrate parameter exchange and synchronization in detail.

```{.python .input}
# Initialize model parameters
Expand Down
Loading

0 comments on commit 5af4d1b

Please sign in to comment.