This is a model with the same specifications as SmolLM2-135M trained from scratch on the Swedish portion of Fineweb-2. It is intended as a baseline for my research is probably rather bad for most purposes :)

Training:

1 Epoch
Learning rate: 5e-4
LR scheduler: Cosine
Warmup ratio: 0.05
Batch size: 1
4 A100 (40GB) GPUs
Gradient accumulation steps: 64
Effective batch size: 256
Max. context length: 8192 tokens

Downloads last month: 26

Safetensors

Model size

135M params

Tensor type

BF16

Inference Providers NEW

Text Generation

This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Dataset used to train jekunz/smollm-135m-fineweb-swedish-from-scratch

Collection including jekunz/smollm-135m-fineweb-swedish-from-scratch

SmolLM baselines trained from scratch

Collection

2 items • Updated 4 days ago