When training a transformer on positionally encoded embeddings, should the tgt output embeddings also be positionally encoded? If so, wouldn’t the predicted/decoded embeddings also be positionally encoded?