--- base_model: unsloth/Llama-3.2-1B language: - en - fi license: llama3.2 tags: - text-generation-inference - transformers - unsloth - llama - trl - sft datasets: - wikimedia/wikipedia --- Here's a "continued pre-trained" model using [Finnish Wikipedia](https://huggingface.co/datasets/wikimedia/wikipedia) dataset. I still don't understand why no one in Finland has figured out that they could just do continued pre-training on existing models that are already supported by every frontend.. I've seen Japanese models perform pretty well with that kind of continued pre-training, yet Finnish models are still done from scratch which means they suck ass. If you compare them to Llama 3 or Gemma 2 they just suck so much. They can't even match Mistral 7B a model from last year. Just stop wasting money on training models from scratch, use these better models as base and train it on all your closed-source data I don't have access to. Thank you. LoRA: [mpasila/Llama-3.2-Finnish-Wikipedia-LoRA-1B](https://huggingface.co/mpasila/Llama-3.2-Finnish-Wikipedia-LoRA-1B) Trained with regular LoRA (not quantized/QLoRA) and LoRA rank was 128 and Alpha set to 32. Trained for 1 epoch using RTX 4090 for about 12,5 hours. So it does have some issues but I could try training it on Gemma 2 2B and see if that's a better model for this (Gemma 2 already is better at Finnish than Llama 3) and maybe add more datasets containing Finnish. ## Evaluation | Model | Size | Type | FIN-bench (score) | Without math | |-------|------|------|-------|-------| | **mpasila/Llama-3.2-Finnish-Wikipedia-1B** | 1B | Base | 0.3170 | 0.4062 | | [unsloth/Llama-3.2-1B](https://huggingface.co/unsloth/Llama-3.2-1B) | 1B | Base | **0.4029** | 0.3881 | | [Finnish-NLP/llama-7b-finnish](https://huggingface.co/Finnish-NLP/llama-7b-finnish) | 7B | Base | 0.2350 | 0.4203 | | [LumiOpen/Viking-7B (1000B)](https://huggingface.co/LumiOpen/Viking-7B) | 7B | Base | 0.3721 | 0.4453 | | [HPLT/gpt-7b-nordic-prerelease](https://huggingface.co/HPLT/gpt-7b-nordic-prerelease) | 7B | Base | 0.3169 | **0.4524** | [Source](https://docs.google.com/spreadsheets/d/1rqJb9dQVihg-Z1_Ras1L_-wuzPg9xNzpdmM2x5HueeY/edit?usp=sharing) #### FIN-bench scores: | Task |Version| Metric |Value | |Stderr| |------------------------------------------------|------:|---------------------|-----:|---|-----:| |bigbench_analogies | 0|multiple_choice_grade|0.4846|± |0.0440| |bigbench_arithmetic_1_digit_addition | 0|multiple_choice_grade|0.0300|± |0.0171| |bigbench_arithmetic_1_digit_division | 0|multiple_choice_grade|0.0435|± |0.0435| |bigbench_arithmetic_1_digit_multiplication | 0|multiple_choice_grade|0.0200|± |0.0141| |bigbench_arithmetic_1_digit_subtraction | 0|multiple_choice_grade|0.0700|± |0.0256| |bigbench_arithmetic_2_digit_addition | 0|multiple_choice_grade|0.2200|± |0.0416| |bigbench_arithmetic_2_digit_division | 0|multiple_choice_grade|0.0800|± |0.0273| |bigbench_arithmetic_2_digit_multiplication | 0|multiple_choice_grade|0.2400|± |0.0429| |bigbench_arithmetic_2_digit_subtraction | 0|multiple_choice_grade|0.1800|± |0.0386| |bigbench_arithmetic_3_digit_addition | 0|multiple_choice_grade|0.3300|± |0.0473| |bigbench_arithmetic_3_digit_division | 0|multiple_choice_grade|0.2100|± |0.0409| |bigbench_arithmetic_3_digit_multiplication | 0|multiple_choice_grade|0.3000|± |0.0461| |bigbench_arithmetic_3_digit_subtraction | 0|multiple_choice_grade|0.5500|± |0.0500| |bigbench_arithmetic_4_digit_addition | 0|multiple_choice_grade|0.2800|± |0.0451| |bigbench_arithmetic_4_digit_division | 0|multiple_choice_grade|0.2500|± |0.0435| |bigbench_arithmetic_4_digit_multiplication | 0|multiple_choice_grade|0.1500|± |0.0359| |bigbench_arithmetic_4_digit_subtraction | 0|multiple_choice_grade|0.4400|± |0.0499| |bigbench_arithmetic_5_digit_addition | 0|multiple_choice_grade|0.5100|± |0.0502| |bigbench_arithmetic_5_digit_division | 0|multiple_choice_grade|0.3000|± |0.0461| |bigbench_arithmetic_5_digit_multiplication | 0|multiple_choice_grade|0.3100|± |0.0465| |bigbench_arithmetic_5_digit_subtraction | 0|multiple_choice_grade|0.4000|± |0.0492| |bigbench_cause_and_effect_one_sentence | 0|multiple_choice_grade|0.5882|± |0.0696| |bigbench_cause_and_effect_one_sentence_no_prompt| 0|multiple_choice_grade|0.3922|± |0.0690| |bigbench_cause_and_effect_two_sentences | 0|multiple_choice_grade|0.4510|± |0.0704| |bigbench_emotions | 0|multiple_choice_grade|0.1938|± |0.0313| |bigbench_empirical_judgments | 0|multiple_choice_grade|0.3434|± |0.0480| |bigbench_general_knowledge | 0|multiple_choice_grade|0.2714|± |0.0535| |bigbench_hhh_alignment_harmless | 0|multiple_choice_grade|0.3966|± |0.0648| |bigbench_hhh_alignment_helpful | 0|multiple_choice_grade|0.3729|± |0.0635| |bigbench_hhh_alignment_honest | 0|multiple_choice_grade|0.3390|± |0.0622| |bigbench_hhh_alignment_other | 0|multiple_choice_grade|0.5581|± |0.0766| |bigbench_intent_recognition | 0|multiple_choice_grade|0.0925|± |0.0110| |bigbench_misconceptions | 0|multiple_choice_grade|0.4403|± |0.0430| |bigbench_paraphrase | 0|multiple_choice_grade|0.5000|± |0.0354| |bigbench_sentence_ambiguity | 0|multiple_choice_grade|0.4833|± |0.0651| |bigbench_similarities_abstraction | 0|multiple_choice_grade|0.5921|± |0.0567| # Uploaded Llama-3.2-Finnish-Wikipedia-1B model - **Developed by:** mpasila - **License:** Llama 3.2 Community License Agreement - **Finetuned from model :** unsloth/Llama-3.2-1B This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library. [](https://github.com/unslothai/unsloth)