Skip to content

Commit

Permalink
readme
Browse files Browse the repository at this point in the history
  • Loading branch information
lucidrains authored Mar 6, 2023
1 parent d95852f commit 9c7c3ce
Showing 1 changed file with 2 additions and 0 deletions.
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@ In regards to learning rate and weight decay, the authors write in section 5 - `

The authors use the same learning rate schedule for Lion as AdamW in the paper. Nevertheless, they observe a larger gain when using a cosine decay schedule, compared to a reciprocal square-root schedule.

The authors recommend using betas of `(0.95, 0.98)` if one encounters instability during fine-tuning. This was <a href="https://github.com/lucidrains/lion-pytorch/issues/13#issuecomment-1455123143">corroborated by a researcher</a>.

Update: seems to work for my local enwik8 autoregressive language modeling

Update 2: <a href="https://api.wandb.ai/links/lucidrains/d4v6c8sl">experiments</a>, seems much worse than Adam if learning rate held constant
Expand Down

0 comments on commit 9c7c3ce

Please sign in to comment.