Tags: mwiacx/NeMo
Tags
Add option for mutex timeout in distributed optimizer backward hook (N… …VIDIA#9087) * Tim: Add option for timeout in distopt callback mutex Signed-off-by: Jaemin Choi <[email protected]> * Replace parent's _lock Signed-off-by: Jaemin Choi <[email protected]> * Revert "Replace parent's _lock" This reverts commit 972d1b6. Signed-off-by: Jaemin Choi <[email protected]> * Raise RuntimeError when timeout Signed-off-by: Jaemin Choi <[email protected]> * Change RuntimeError to print Signed-off-by: Jaemin Choi <[email protected]> --------- Signed-off-by: Jaemin Choi <[email protected]> Co-authored-by: Jaemin Choi <[email protected]>
update github raw content link (NVIDIA#8517) Signed-off-by: Chen Cui <[email protected]>
Update Apex install command in Dockerfile (NVIDIA#7794) * move core install to /workspace (NVIDIA#7706) Signed-off-by: Abhinav Khattar <[email protected]> * update apex install in dockerfile Signed-off-by: eharper <[email protected]> * use fetch head Signed-off-by: eharper <[email protected]> --------- Signed-off-by: Abhinav Khattar <[email protected]> Signed-off-by: eharper <[email protected]> Co-authored-by: Abhinav Khattar <[email protected]>
Eagerly accumulate embedding grads into fp32 buffer (NVIDIA#6958) Signed-off-by: Tim Moon <[email protected]>
Rename r1.19.0 -> r1.19.1 Signed-off-by: Igor Gitman <[email protected]>
PreviousNext