Dive into Deep Learning, redone by Quanta Magazine
This repository will house a visualization that will attempt to convey instant enlightenment of how Attention works, in the field of artificial intelligence. Obviously I believe this algorithm to be one of the most important developments in the history of deep learning. We can possibly use it to solve, well, everything.
In my mind, one good intuitive visualization can bring about more insight and understanding than long highly paid tutoring / courses.
Attention has many interpretations, ranging from physics based intepretations to speculations on biological plausibility.
Update: Recently, three papers have concurrently closed in on a connection between self-attention and gradient descent, while investigating in-context learning properties of Transformers!
- Transformers learn in-context by gradient descent
- What learning algorithm is in-context learning? Investigations with linear models
- Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent as Meta-Optimizers
- Protein Folding
- Language
- Vision
- Image Segmentation
- Speech Recognition
- Symbolic Mathematics
- Midi Generation
- Theorem Proving
- Gene Expression
- Text to Image
- Attention-only Text to Image
- Text to Video
- Text to Video 2
- Code Generation
- Language+
- Protein Generation
- Multimodal Model
- Video Understanding
- Heart Disease Classification
- Weather Forecasting
- Text to Speech
- Few-Shot Visual Question Answering
- Generalist Agent
- Audio Generation from Raw Waveform
- Sample Efficient World Model
- Audio / Speech Generation
- Nucleic Acid / Protein Binding
- Generalizable Prompting for Robotic Arm Control
- Zero-shot Text to Speech
- Music Generation
- Designing Molecular Scissors for DNA
- Nucleic Language Model
Will keep adding to this list as time goes on
No one really knows. All I know is, if we were to dethrone attention with a better algorithm, it is over. Part of what motivates me to do some scalable 21st century teaching is the hope maybe someone can find a way to improve on it, or find its replacement. It just takes one discovery!
Large thanks goes to 3Blue1Brown for showing us that complex mathematics can be taught with such elegance and potency through visualizations
@misc{vaswani2017attention,
title = {Attention Is All You Need},
author = {Ashish Vaswani and Noam Shazeer and Niki Parmar and Jakob Uszkoreit and Llion Jones and Aidan N. Gomez and Lukasz Kaiser and Illia Polosukhin},
year = {2017},
eprint = {1706.03762},
archivePrefix = {arXiv},
primaryClass = {cs.CL}
}
@article{Bahdanau2015NeuralMT,
title = {Neural Machine Translation by Jointly Learning to Align and Translate},
author = {Dzmitry Bahdanau and Kyunghyun Cho and Yoshua Bengio},
journal = {CoRR},
year = {2015},
volume = {abs/1409.0473}
}