just bait some researcher into doing a paper for gpt

noticeable · Oct 2, 2023 · 9fc3023 · 9fc3023
1 parent 7e0cd2d
commit 9fc3023
Showing 1 changed file with 3 additions and 1 deletion.
diff --git a/README.md b/README.md
@@ -297,7 +297,7 @@ enc = Encoder(
 
 https://arxiv.org/abs/2006.11527
 
-Proposes adding learned tokens, akin to CLS tokens, named memory tokens, that is passed through the attention layers alongside the input tokens.
+Proposes adding learned tokens, akin to CLS tokens, named memory tokens, that is passed through the attention layers alongside the input tokens. This setting is compatible with both encoder and decoder training.
 
 ```python
 import torch
@@ -315,6 +315,8 @@ model = TransformerWrapper(
 )
 ```
 
+Update: MetaAI researchers <a href="https://arxiv.org/abs/2309.16588">have found</a> that adding memory tokens (they call them register tokens), alleviates outliers (which is suspected now to be a pathology of attention networks unable to <a href="https://arxiv.org/abs/2306.12929">attend to nothing</a>).
+
 ### Transformers Without Tears
 
 <img src="./images/scalenorm.png"></img>