Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models
deep-learning optimizer pytorch artificial-intelligence moe resnet vit diffusion mae fairseq cuda-programming bert-model gpt2 transformer-xl timm convnext adan llms dreamfusion llm-training
-
Updated
Jul 2, 2024 - Python