Llama 3.1-8B with continued pre-training on Bulgarian text.
Dataset: Part of BG Wikipedia + own custom dataset.
Goal: have a better version for Bulgarian language to further finetune.
-
Base model