Skip to content

This repository will house a visualization that will attempt to convey instant enlightenment of how Attention works to someone not working in artificial intelligence, with 3Blue1Brown as inspiration

License

Notifications You must be signed in to change notification settings

TiagoZhang/attention

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

48 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Dive into Deep Learning, redone by Quanta Magazine

Attention (wip)

This repository will house a visualization that will attempt to convey instant enlightenment of how Attention works, in the field of artificial intelligence. Obviously I believe this algorithm to be one of the most important developments in the history of deep learning. We can possibly use it to solve, well, everything.

In my mind, one good intuitive visualization can bring about more insight and understanding than long highly paid tutoring / courses.

Why does it work?

Attention has many interpretations, ranging from physics based intepretations to speculations on biological plausibility.

Update: Recently, three papers have concurrently closed in on a connection between self-attention and gradient descent, while investigating in-context learning properties of Transformers!

  1. Transformers learn in-context by gradient descent
  2. What learning algorithm is in-context learning? Investigations with linear models
  3. Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent as Meta-Optimizers

What has Attention accomplished?

Will keep adding to this list as time goes on

Other resources

Is it all we need?

No one really knows. All I know is, if we were to dethrone attention with a better algorithm, it is over. Part of what motivates me to do some scalable 21st century teaching is the hope maybe someone can find a way to improve on it, or find its replacement. It just takes one discovery!

Appreciation

Large thanks goes to 3Blue1Brown for showing us that complex mathematics can be taught with such elegance and potency through visualizations

Citations

@misc{vaswani2017attention,
    title   = {Attention Is All You Need},
    author  = {Ashish Vaswani and Noam Shazeer and Niki Parmar and Jakob Uszkoreit and Llion Jones and Aidan N. Gomez and Lukasz Kaiser and Illia Polosukhin},
    year    = {2017},
    eprint  = {1706.03762},
    archivePrefix = {arXiv},
    primaryClass = {cs.CL}
}
@article{Bahdanau2015NeuralMT,
    title   = {Neural Machine Translation by Jointly Learning to Align and Translate},
    author  = {Dzmitry Bahdanau and Kyunghyun Cho and Yoshua Bengio},
    journal = {CoRR},
    year    = {2015},
    volume  = {abs/1409.0473}
}

About

This repository will house a visualization that will attempt to convey instant enlightenment of how Attention works to someone not working in artificial intelligence, with 3Blue1Brown as inspiration

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • HTML 100.0%