The Surprising Agreement Between Convex Optimization Theory and Learning-Rate Scheduling for Large Model Training Paper • 2501.18965 • Published 9 days ago • 5
The Surprising Agreement Between Convex Optimization Theory and Learning-Rate Scheduling for Large Model Training Paper • 2501.18965 • Published 9 days ago • 5 • 3
SGD with Clipping is Secretly Estimating the Median Gradient Paper • 2402.12828 • Published Feb 20, 2024
Descent Only Collection Papers, posts and resources related to optimization for ML. • 6 items • Updated Mar 13, 2024
Descent Only Collection Papers, posts and resources related to optimization for ML. • 6 items • Updated Mar 13, 2024
Descent Only Collection Papers, posts and resources related to optimization for ML. • 6 items • Updated Mar 13, 2024
Descent Only Collection Papers, posts and resources related to optimization for ML. • 6 items • Updated Mar 13, 2024