Here's a "continued pre-trained" model using Finnish Wikipedia dataset. I still don't understand why no one in Finland has figured out that they could just do continued pre-training on existing models that are already supported by every frontend.. I've seen Japanese models perform pretty well with that kind of continued pre-training, yet Finnish models are still done from scratch which means they suck ass. If you compare them to Llama 3 or Gemma 2 they just suck so much. They can't even match Mistral 7B a model from last year. Just stop wasting money on training models from scratch, use these better models as base and train it on all your closed-source data I don't have access to. Thank you.
LoRA: mpasila/Llama-3.2-Finnish-Wikipedia-LoRA-1B
Trained with regular LoRA (not quantized/QLoRA) and LoRA rank was 128 and Alpha set to 32. Trained for 1 epoch using RTX 4090 for about 12,5 hours.
So it does have some issues but I could try training it on Gemma 2 2B and see if that's a better model for this (Gemma 2 already is better at Finnish than Llama 3) and maybe add more datasets containing Finnish.
Evaluation
Model | Size | Type | FIN-bench (score) | Without math |
---|---|---|---|---|
mpasila/Llama-3.2-Finnish-Wikipedia-1B | 1B | Base | 0.3170 | 0.4062 |
unsloth/Llama-3.2-1B | 1B | Base | 0.4029 | 0.3881 |
Finnish-NLP/llama-7b-finnish | 7B | Base | 0.2350 | 0.4203 |
LumiOpen/Viking-7B (1000B) | 7B | Base | 0.3721 | 0.4453 |
HPLT/gpt-7b-nordic-prerelease | 7B | Base | 0.3169 | 0.4524 |
FIN-bench scores:
Task | Version | Metric | Value | Stderr | |
---|---|---|---|---|---|
bigbench_analogies | 0 | multiple_choice_grade | 0.4846 | ± | 0.0440 |
bigbench_arithmetic_1_digit_addition | 0 | multiple_choice_grade | 0.0300 | ± | 0.0171 |
bigbench_arithmetic_1_digit_division | 0 | multiple_choice_grade | 0.0435 | ± | 0.0435 |
bigbench_arithmetic_1_digit_multiplication | 0 | multiple_choice_grade | 0.0200 | ± | 0.0141 |
bigbench_arithmetic_1_digit_subtraction | 0 | multiple_choice_grade | 0.0700 | ± | 0.0256 |
bigbench_arithmetic_2_digit_addition | 0 | multiple_choice_grade | 0.2200 | ± | 0.0416 |
bigbench_arithmetic_2_digit_division | 0 | multiple_choice_grade | 0.0800 | ± | 0.0273 |
bigbench_arithmetic_2_digit_multiplication | 0 | multiple_choice_grade | 0.2400 | ± | 0.0429 |
bigbench_arithmetic_2_digit_subtraction | 0 | multiple_choice_grade | 0.1800 | ± | 0.0386 |
bigbench_arithmetic_3_digit_addition | 0 | multiple_choice_grade | 0.3300 | ± | 0.0473 |
bigbench_arithmetic_3_digit_division | 0 | multiple_choice_grade | 0.2100 | ± | 0.0409 |
bigbench_arithmetic_3_digit_multiplication | 0 | multiple_choice_grade | 0.3000 | ± | 0.0461 |
bigbench_arithmetic_3_digit_subtraction | 0 | multiple_choice_grade | 0.5500 | ± | 0.0500 |
bigbench_arithmetic_4_digit_addition | 0 | multiple_choice_grade | 0.2800 | ± | 0.0451 |
bigbench_arithmetic_4_digit_division | 0 | multiple_choice_grade | 0.2500 | ± | 0.0435 |
bigbench_arithmetic_4_digit_multiplication | 0 | multiple_choice_grade | 0.1500 | ± | 0.0359 |
bigbench_arithmetic_4_digit_subtraction | 0 | multiple_choice_grade | 0.4400 | ± | 0.0499 |
bigbench_arithmetic_5_digit_addition | 0 | multiple_choice_grade | 0.5100 | ± | 0.0502 |
bigbench_arithmetic_5_digit_division | 0 | multiple_choice_grade | 0.3000 | ± | 0.0461 |
bigbench_arithmetic_5_digit_multiplication | 0 | multiple_choice_grade | 0.3100 | ± | 0.0465 |
bigbench_arithmetic_5_digit_subtraction | 0 | multiple_choice_grade | 0.4000 | ± | 0.0492 |
bigbench_cause_and_effect_one_sentence | 0 | multiple_choice_grade | 0.5882 | ± | 0.0696 |
bigbench_cause_and_effect_one_sentence_no_prompt | 0 | multiple_choice_grade | 0.3922 | ± | 0.0690 |
bigbench_cause_and_effect_two_sentences | 0 | multiple_choice_grade | 0.4510 | ± | 0.0704 |
bigbench_emotions | 0 | multiple_choice_grade | 0.1938 | ± | 0.0313 |
bigbench_empirical_judgments | 0 | multiple_choice_grade | 0.3434 | ± | 0.0480 |
bigbench_general_knowledge | 0 | multiple_choice_grade | 0.2714 | ± | 0.0535 |
bigbench_hhh_alignment_harmless | 0 | multiple_choice_grade | 0.3966 | ± | 0.0648 |
bigbench_hhh_alignment_helpful | 0 | multiple_choice_grade | 0.3729 | ± | 0.0635 |
bigbench_hhh_alignment_honest | 0 | multiple_choice_grade | 0.3390 | ± | 0.0622 |
bigbench_hhh_alignment_other | 0 | multiple_choice_grade | 0.5581 | ± | 0.0766 |
bigbench_intent_recognition | 0 | multiple_choice_grade | 0.0925 | ± | 0.0110 |
bigbench_misconceptions | 0 | multiple_choice_grade | 0.4403 | ± | 0.0430 |
bigbench_paraphrase | 0 | multiple_choice_grade | 0.5000 | ± | 0.0354 |
bigbench_sentence_ambiguity | 0 | multiple_choice_grade | 0.4833 | ± | 0.0651 |
bigbench_similarities_abstraction | 0 | multiple_choice_grade | 0.5921 | ± | 0.0567 |
Uploaded Llama-3.2-Finnish-Wikipedia-1B model
- Developed by: mpasila
- License: Llama 3.2 Community License Agreement
- Finetuned from model : unsloth/Llama-3.2-1B
This llama model was trained 2x faster with Unsloth and Huggingface's TRL library.
- Downloads last month
- 179