https://arxiv.org/abs/2301.08984
SuperScaler: Supporting Flexible DNN Parallelization via a Unified Abstraction (Zhiqi Lin, Youshan Miao, Guodong Liu, Xiaoxiang Shi, Quanlu Zhang, Fan Yang, Saeed Maleki, Yi Zhu, Xu Cao, Cheng Li, Mao Yang, Lintao Zhang, Lidong Zhou)
distributed parallelization 최적화. megatron, deepspeed, alpa를 베이스라인으로 비교한 결과들이라는 게 흥미롭네요.
#distributed_training