Avoid modifying rebuild buckets state in no_grad context (pytorch#54159)

Summary: Pull Request resolved: pytorch#54159 See pytorch#54059 for discussion. In short, users might want to run evaluation on a single rank in `torch.no_grad()` mode. When this happens, we need to make sure that we skip all rebuild bucket logics, as the forward only runs on one rank and not all peers can sure the bucket configuration sync communication. Test Plan: Imported from OSS Reviewed By: zhaojuanmao Differential Revision: D27119666 Pulled By: mrshenli fbshipit-source-id: 4b2f8cce937cdd893e89d8d10c9267d255ba52ea
mzl9039 · Mar 18, 2021 · ef9ee46 · ef9ee46
1 parent fef0219
commit ef9ee46
Showing 1 changed file with 1 addition and 1 deletion.
diff --git a/torch/nn/parallel/distributed.py b/torch/nn/parallel/distributed.py
@@ -734,7 +734,7 @@ def forward(self, *inputs, **kwargs):
         # call _rebuild_buckets before the peak memory usage increases
         # during forward computation.
         # This should be called only once during whole training period.
-        if self.reducer._rebuild_buckets():
+        if torch.is_grad_enabled() and self.reducer._rebuild_buckets():
             logging.info("Reducer buckets have been rebuilt in this iteration.")
 
         if self.require_forward_param_sync: