Skip to content

Commit

Permalink
just bait some researcher into doing a paper for gpt
Browse files Browse the repository at this point in the history
  • Loading branch information
lucidrains committed Oct 2, 2023
1 parent 7e0cd2d commit 9fc3023
Showing 1 changed file with 3 additions and 1 deletion.
4 changes: 3 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -297,7 +297,7 @@ enc = Encoder(

https://arxiv.org/abs/2006.11527

Proposes adding learned tokens, akin to CLS tokens, named memory tokens, that is passed through the attention layers alongside the input tokens.
Proposes adding learned tokens, akin to CLS tokens, named memory tokens, that is passed through the attention layers alongside the input tokens. This setting is compatible with both encoder and decoder training.

```python
import torch
Expand All @@ -315,6 +315,8 @@ model = TransformerWrapper(
)
```

Update: MetaAI researchers <a href="https://arxiv.org/abs/2309.16588">have found</a> that adding memory tokens (they call them register tokens), alleviates outliers (which is suspected now to be a pathology of attention networks unable to <a href="https://arxiv.org/abs/2306.12929">attend to nothing</a>).

### Transformers Without Tears

<img src="./images/scalenorm.png"></img>
Expand Down

0 comments on commit 9fc3023

Please sign in to comment.