datasets: | |
- EleutherAI/pile | |
language: | |
- en | |
Based model but uses layernorm instead of QK.sum(-1) for the normalization, for better hardware efficiency. |
datasets: | |
- EleutherAI/pile | |
language: | |
- en | |
Based model but uses layernorm instead of QK.sum(-1) for the normalization, for better hardware efficiency. |