Relevant Video
- Transformers help solve an initial problem - that of context and long sequences. Problems in the future are on reducing computation complexity, enhanced controllability and domain specific foundation models.
- Problems are really w.r.t memory when context gets longer and transformer starts to forget the older stuff
- Initial seminal idea was that of attention, which emerged slowly over the last decade and culminated in a transformer architecture.
- 2014 : Neural Machine Translation
- 2017 : Attention is all you need