Rethinking Optimization and Architecture for Tiny Language Models Paper • 2402.02791 • Published Feb 5, 2024 • 13