Transformers are the new go-to technology for Natural Language Processing (NLP) and are also starting to gain traction in the computer vision community. However, despite all their successes and widespread adoption, they have one major drawback: Their computation and memory requirements grow quadratically with the input size. Hence training transformer models from scratch is a very resource-intensive task.
In this session we want to take a look at the current state of the research into efficient transformer layers, i.e. reformulations of the vanilla transformers that have computation and/or memory requirements of O(n*log(n)) or even O(n). If your knowledge about transformers or complexity theory is a bit rusty, do not worry: The session will start with a short refresher on both topics so you can make the most of it.