SupCL-Seq: Supervised Contrastive Learning for Downstream Optimized Sequence Representations Paper โข 2109.07424 โข Published Sep 15, 2021 โข 1
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models Paper โข 2404.02258 โข Published Apr 2, 2024 โข 104