Something to reference.
In vague terms, a transfomer does as titled. Takes a sequence and transforms it into another sequence.
A transformer is largely composed of encoders and/or decoders. These coders are often simply RNNs, neural networks that take their output as input recursivly.
The encoder takes some input, encodes it into w/e vector is ideal. This vector, referred to as our context, is passed to the decoder which returns our context back into a sequence.
Terms/Concepts to internalize:
- Context Length
- Word Embeddings
- Softmax