jekunz
/

smollm-135m-fineweb-swedish-from-scratch

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

smollm-135m-fineweb-swedish-from-scratch / README.md

jekunz's picture

Update README.md

f7d8526 verified 14 days ago

|

history blame contribute delete

567 Bytes

	---
	license: apache-2.0
	datasets:
	- HuggingFaceFW/fineweb-2
	language:
	- sv
	library_name: transformers
	pipeline_tag: text-generation
	---
	This is a model with the same specifications as SmolLM2-135M trained from scratch on the Swedish portion of Fineweb-2. It is intended as a baseline for my research is probably rather bad for most purposes :)

	Training:
	- 1 Epoch
	- Learning rate: 5e-4
	- LR scheduler: Cosine
	- Warmup ratio: 0.05
	- Batch size: 1
	- 4 A100 (40GB) GPUs
	- Gradient accumulation steps: 64
	- Effective batch size: 256
	- Max. context length: 8192 tokens