|
--- |
|
license: apache-2.0 |
|
datasets: |
|
- HuggingFaceFW/fineweb-2 |
|
language: |
|
- sv |
|
library_name: transformers |
|
pipeline_tag: text-generation |
|
--- |
|
This is a model with the same specifications as SmolLM2-135M trained from scratch on the Swedish portion of Fineweb-2. It is intended as a baseline for my research is probably rather bad for most purposes :) |
|
|
|
Training: |
|
- 1 Epoch |
|
- Learning rate: 5e-4 |
|
- LR scheduler: Cosine |
|
- Warmup ratio: 0.05 |
|
- Batch size: 1 |
|
- 4 A100 (40GB) GPUs |
|
- Gradient accumulation steps: 64 |
|
- Effective batch size: 256 |
|
- Max. context length: 8192 tokens |