K2, LLM360's most powerful, scaled model series.
7 items
We encountered two major loss spikes while training K2.
We are releasing these checkpoints so others can study this interesting phenomena in large model training.
Loss spikes are still a relatively unknown phenomena. By making these spikes and associated training details available, we hope others use these artifacts to further the worlds knowledge on this topic.
Checkpoints | |
Checkpoint 160 | Checkpoint 170 |
Checkpoint 162 | Checkpoint 172 |
Checkpoint 164 | Checkpoint 174 |
Checkpoint 166 | Checkpoint 176 |
Checkpoint 168 | Checkpoint 178 |
[to find all branches: git branch -a]
View all the evaluations on our Weights & Biases here
The LLM360 Research Suite is a comprehensive set of large language model (LLM) artifacts from Amber, CrystalCoder, and K2 for academic and industry researchers to explore LLM training dynamics. Additional resources can be found at llm360.ai.
title={LLM360-K2-65B: Scaling Up Open and Transparent Language Models},
author={The LLM360 Team},