Instruction finetuning Script
Hello, firstly, thank you so much for releasing this model. It performs exceptionally well on greek-to-english translation and keyword extraction tasks. I have a question regarding the finetuning of this model, does this follow the same finetuning script as a traditional llama 3.1 8b model? And have you guys tested the imapct of lora finetuing/instruction finetuning (for a specific keyword extrcation task, lets say), on its other tasks?
Hello, you should be good to go, using any traditional llama recipe.
The only deviation from the traditional llama model that may affect you is that our CPT utilized a padded packing strategy, so we set one of the llama reserve tokens as padding. Meaning that if for any reason you're pre-padding your data in your recipe AND not using tokenizer.pad_token to do it, you'll have to change something there (it's an extreme edge case) .
We haven't had any indication that the model gets impacted negatively by LoRA/Full fine-tuning in a different way than other models of its size.
@LVouk Thank you for the quick response! Looking forward to using the model.