---
base_model: unsloth/Llama-3.2-1B
language:
- en
- fi
license: llama3.2
tags:
- text-generation-inference
- transformers
- unsloth
- llama
- trl
- sft
datasets:
- wikimedia/wikipedia
---
Here's a "continued pre-trained" model using [Finnish Wikipedia](https://huggingface.co/datasets/wikimedia/wikipedia) dataset. I still don't understand why no one in Finland has figured out that they could just do continued pre-training on existing models that are already supported by every frontend.. I've seen Japanese models perform pretty well with that kind of continued pre-training, yet Finnish models are still done from scratch which means they suck ass. If you compare them to Llama 3 or Gemma 2 they just suck so much. They can't even match Mistral 7B a model from last year. Just stop wasting money on training models from scratch, use these better models as base and train it on all your closed-source data I don't have access to. Thank you.

LoRA: [mpasila/Llama-3.2-Finnish-Wikipedia-LoRA-1B](https://huggingface.co/mpasila/Llama-3.2-Finnish-Wikipedia-LoRA-1B)

Trained with regular LoRA (not quantized/QLoRA) and LoRA rank was 128 and Alpha set to 32. Trained for 1 epoch using RTX 4090 for about 12,5 hours.

So it does have some issues but I could try training it on Gemma 2 2B and see if that's a better model for this (Gemma 2 already is better at Finnish than Llama 3) and maybe add more datasets containing Finnish.

## Evaluation

| Model | Size | Type | FIN-bench (score) | Without math |
|-------|------|------|-------|-------|
| **mpasila/Llama-3.2-Finnish-Wikipedia-1B** | 1B | Base | 0.3170 | 0.4062 |
| [unsloth/Llama-3.2-1B](https://huggingface.co/unsloth/Llama-3.2-1B) | 1B | Base | **0.4029** | 0.3881 |
| [Finnish-NLP/llama-7b-finnish](https://huggingface.co/Finnish-NLP/llama-7b-finnish) | 7B | Base | 0.2350 | 0.4203 |
| [LumiOpen/Viking-7B (1000B)](https://huggingface.co/LumiOpen/Viking-7B) | 7B | Base | 0.3721 | 0.4453 |
| [HPLT/gpt-7b-nordic-prerelease](https://huggingface.co/HPLT/gpt-7b-nordic-prerelease) | 7B | Base | 0.3169 | **0.4524** |

[Source](https://docs.google.com/spreadsheets/d/1rqJb9dQVihg-Z1_Ras1L_-wuzPg9xNzpdmM2x5HueeY/edit?usp=sharing)

#### FIN-bench scores:

|                      Task                      |Version|       Metric        |Value |   |Stderr|
|------------------------------------------------|------:|---------------------|-----:|---|-----:|
|bigbench_analogies                              |      0|multiple_choice_grade|0.4846|±  |0.0440|
|bigbench_arithmetic_1_digit_addition            |      0|multiple_choice_grade|0.0300|±  |0.0171|
|bigbench_arithmetic_1_digit_division            |      0|multiple_choice_grade|0.0435|±  |0.0435|
|bigbench_arithmetic_1_digit_multiplication      |      0|multiple_choice_grade|0.0200|±  |0.0141|
|bigbench_arithmetic_1_digit_subtraction         |      0|multiple_choice_grade|0.0700|±  |0.0256|
|bigbench_arithmetic_2_digit_addition            |      0|multiple_choice_grade|0.2200|±  |0.0416|
|bigbench_arithmetic_2_digit_division            |      0|multiple_choice_grade|0.0800|±  |0.0273|
|bigbench_arithmetic_2_digit_multiplication      |      0|multiple_choice_grade|0.2400|±  |0.0429|
|bigbench_arithmetic_2_digit_subtraction         |      0|multiple_choice_grade|0.1800|±  |0.0386|
|bigbench_arithmetic_3_digit_addition            |      0|multiple_choice_grade|0.3300|±  |0.0473|
|bigbench_arithmetic_3_digit_division            |      0|multiple_choice_grade|0.2100|±  |0.0409|
|bigbench_arithmetic_3_digit_multiplication      |      0|multiple_choice_grade|0.3000|±  |0.0461|
|bigbench_arithmetic_3_digit_subtraction         |      0|multiple_choice_grade|0.5500|±  |0.0500|
|bigbench_arithmetic_4_digit_addition            |      0|multiple_choice_grade|0.2800|±  |0.0451|
|bigbench_arithmetic_4_digit_division            |      0|multiple_choice_grade|0.2500|±  |0.0435|
|bigbench_arithmetic_4_digit_multiplication      |      0|multiple_choice_grade|0.1500|±  |0.0359|
|bigbench_arithmetic_4_digit_subtraction         |      0|multiple_choice_grade|0.4400|±  |0.0499|
|bigbench_arithmetic_5_digit_addition            |      0|multiple_choice_grade|0.5100|±  |0.0502|
|bigbench_arithmetic_5_digit_division            |      0|multiple_choice_grade|0.3000|±  |0.0461|
|bigbench_arithmetic_5_digit_multiplication      |      0|multiple_choice_grade|0.3100|±  |0.0465|
|bigbench_arithmetic_5_digit_subtraction         |      0|multiple_choice_grade|0.4000|±  |0.0492|
|bigbench_cause_and_effect_one_sentence          |      0|multiple_choice_grade|0.5882|±  |0.0696|
|bigbench_cause_and_effect_one_sentence_no_prompt|      0|multiple_choice_grade|0.3922|±  |0.0690|
|bigbench_cause_and_effect_two_sentences         |      0|multiple_choice_grade|0.4510|±  |0.0704|
|bigbench_emotions                               |      0|multiple_choice_grade|0.1938|±  |0.0313|
|bigbench_empirical_judgments                    |      0|multiple_choice_grade|0.3434|±  |0.0480|
|bigbench_general_knowledge                      |      0|multiple_choice_grade|0.2714|±  |0.0535|
|bigbench_hhh_alignment_harmless                 |      0|multiple_choice_grade|0.3966|±  |0.0648|
|bigbench_hhh_alignment_helpful                  |      0|multiple_choice_grade|0.3729|±  |0.0635|
|bigbench_hhh_alignment_honest                   |      0|multiple_choice_grade|0.3390|±  |0.0622|
|bigbench_hhh_alignment_other                    |      0|multiple_choice_grade|0.5581|±  |0.0766|
|bigbench_intent_recognition                     |      0|multiple_choice_grade|0.0925|±  |0.0110|
|bigbench_misconceptions                         |      0|multiple_choice_grade|0.4403|±  |0.0430|
|bigbench_paraphrase                             |      0|multiple_choice_grade|0.5000|±  |0.0354|
|bigbench_sentence_ambiguity                     |      0|multiple_choice_grade|0.4833|±  |0.0651|
|bigbench_similarities_abstraction               |      0|multiple_choice_grade|0.5921|±  |0.0567|

# Uploaded Llama-3.2-Finnish-Wikipedia-1B model

- **Developed by:** mpasila
- **License:** Llama 3.2 Community License Agreement
- **Finetuned from model :** unsloth/Llama-3.2-1B

This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.

[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)