In terms of computational complexity, self-attention layers are faster than recurrent layers when the sequence length is smaller than the representation dimensionality
It shows exceptional greatness to make use of wise people.. skillfully make those whom nature made superior your servants.. through your words, as many wise people as advised you will speak. You'll gain a reputation as an oracle through the sweat of others.