Skip to content

Commit

Permalink
Avoid modifying rebuild buckets state in no_grad context (pytorch#54159)
Browse files Browse the repository at this point in the history
Summary:
Pull Request resolved: pytorch#54159

See pytorch#54059 for discussion.

In short, users might want to run evaluation on a single rank
in `torch.no_grad()` mode. When this happens, we need to make
sure that we skip all rebuild bucket logics, as the forward only
runs on one rank and not all peers can sure the bucket configuration
sync communication.

Test Plan: Imported from OSS

Reviewed By: zhaojuanmao

Differential Revision: D27119666

Pulled By: mrshenli

fbshipit-source-id: 4b2f8cce937cdd893e89d8d10c9267d255ba52ea
  • Loading branch information
mrshenli authored and facebook-github-bot committed Mar 18, 2021
1 parent fef0219 commit ef9ee46
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion torch/nn/parallel/distributed.py
Original file line number Diff line number Diff line change
Expand Up @@ -734,7 +734,7 @@ def forward(self, *inputs, **kwargs):
# call _rebuild_buckets before the peak memory usage increases
# during forward computation.
# This should be called only once during whole training period.
if self.reducer._rebuild_buckets():
if torch.is_grad_enabled() and self.reducer._rebuild_buckets():
logging.info("Reducer buckets have been rebuilt in this iteration.")

if self.require_forward_param_sync:
Expand Down

0 comments on commit ef9ee46

Please sign in to comment.