This is a simplified implementation of attention and doesn’t cover all the details. For example, transformer models typically use a more sophisticated form of attention called “multi-head attention”, which uses multiple sets of attention weights to focus on different parts of the input. They also incorporate positional information into the attention weights to account for the order of words in a sentence.
For a full implementation of a transformer model, including the attention mechanism, you might want to refer to the The Annotated Transformer by Harvard NLP group, which provides a step-by-step walkthrough of the model with accompanying PyTorch code.