Pytorch implementation of attention is all you need paper.
Since the introduction of Transformer architecture, it has become a dominant model in the field of natural language processing achieving remarkable results. The heart of this architecture is 'Attention module' which not only does let the network capture long-range dependencies but also be more parallelizable. However it's been shown in some works that all parts together achieve superior results and the neglect of some components might lead to degenration of output.
Some intuitive blog posts on transformer:
- The illustrated transformer
- Transformer: A Novel Neural Network Architecture for Language Understanding
- The Annotated Transformer
- Python = 3.7
- Pytorch = 1.9
git clone https://github.com/Hojjat-Mokhtarabadi/Attention-is-all-you-need.git
cd Attention-is-all-you-need
python3 -m pip install -r requirements.txt
cd src
bash ../run.sh