Mixture-of-Depths: Dynamically allocating compute in transformer-based language models Paper • 2404.02258 • Published Apr 2 • 104
The Flan Collection: Designing Data and Methods for Effective Instruction Tuning Paper • 2301.13688 • Published Jan 31, 2023 • 8