Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Support fp32 gradaccum for bf16 model (microsoft#2566)
* allow bf16 model with fp32 gradient accumulation datatype * allow fp32 gradient accumulation and bfloat16 model in amp mode * alternative fix for grad accumulation type mismatch. In the case of zero optimizer we should have grad accum type == model data type Co-authored-by: Olatunji Ruwase <[email protected]>
- Loading branch information