Created in November 26, 2024
2024
A new preprint on distributed sign momentum method for pre-training transformer models.