single

Transformer Dissection: A Unified Understanding of Transformer’s Attention via the Lens of Kernel

Transformer Dissection: A Unified Understanding of Transformer’s Attention via the Lens of Kernel

Tsai, Y. H. H., Bai, S., Yamada, M., Morency, L. P., & Salakhutdinov, R. (2019, November). Transformer Dissection: An Unified Understanding for Transformer’s Attention via the Lens of Kernel. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) (pp. 4335-4344).