https://arxiv.org/abs/2106.01540
Luna: Linear Unified Nested Attention (Xuezhe Ma, Xiang Kong, Sinong Wang, Chunting Zhou, Jonathan May, Hao Ma, Luke Zettlemoyer)
efficient attention. 핵심은 제한된 길이의 query bank를 활용해서 key/value를 aggregation 하는 것이군요. query bank 또한 attention layer가 예측하도록 하고. 흠.
#efficient_attention