Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Divide the loss by update_freq when using gradient accumulation. (fac…
…ebookresearch#1833) * Follow fairseq and divide the loss by update_freq when using gradient accumulation. * Change update_freq default.
- Loading branch information