Gemma Scope: Open Sparse Autoencoders Everywhere All At Once on Gemma 2 Paper • 2408.05147 • Published Aug 9 • 37
WARP: On the Benefits of Weight Averaged Rewarded Policies Paper • 2406.16768 • Published Jun 24 • 22
DiLoCo: Distributed Low-Communication Training of Language Models Paper • 2311.08105 • Published Nov 14, 2023 • 14