目录
warmup使用动态的学习率(一般lr先增大 后减小),
- lr一开始别太大,有助于缓解模型在初始阶段,对前几个batch数据过拟合;
- 训练初期,模型对数据还比较陌生,较大的学习率可能会破坏预训练好的权重
- lr后面小一点,有助于模型后期的稳定;
- 训练稳定期,较大的学习率可能会破坏模型的稳定,导致跳出最优解
常见的warmup种类:
transformers提供了上述几种warmup,可以直接使用。
首先设置warmup,代码示例:
# 根据warmup配置,设置warmup
total_steps = len(train_texts) // batch_size * epoch
num_warmup_steps = self.warmup_step_num if isinstance(self.warmup_step_num, int) else \
int(total_steps * self.warmup_step_num)
assert num_warmup_steps <= total_steps, \
'num_warmup_steps {} is too large, more than total_steps {}'.format(num_warmup_steps, total_steps)
if self.warmup_type == 'linear':
warmup_scheduler = get_linear_schedule_with_warmup(self.optimizer, num_warmup_steps, total_steps)
elif self.warmup_type == 'cosine':
warmup_scheduler = get_cosine_schedule_with_warmup(self.optimizer, num_warmup_steps, total_steps)
else:
warmup_scheduler = get_constant_schedule_with_warmup(self.optimizer, num_warmup_steps)
然后在每个训练step内,执行warmup_scheduler.step()
,不断更新lr;
完整代码请参考以下源代码: