Training details
Hello and thanks for the good model!
If I understood well, after DPO on an English dataset, the model has been trained on Italian data.
Can you share more details about this step? I can't find the related script on GitHub...
Hi, you can find the DPO script here: https://github.com/marcopoli/LLaMAntino-3-ANITA/blob/main/model_adaptation/dpo_llama3.py
and the SFT script here: https://github.com/marcopoli/LLaMAntino-3-ANITA/blob/main/model_adaptation/finetune_llama3.py
Just change "model_name" and "dataset" accordingly. For the adaptation on the Italian language, just use the SFT script on a small portion of an Italian Data (e.g., gsarti/clean_mc4_it) using plain text without chat template, i.e. (<|begin_of_text|> {text} <|eot_id|><|end_of_text|>)
Thanks.
Very informative!
Hi @m-polignano-uniba ,
Is fine-tuning with the Italian language performed with QLoRA/LoRa or without?
Yes, we used QLoRA through Unsloth:
- load_in_4bit=True, r = 64, lora_alpha = 16, ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"]
During the language adaptation phase, can you share a rough idea of the peak GPU VRAM usage?
In the paper, I read you used an Nvidia H100 64GB GPU but further details would be much appreciated.
Unfortunately, we use an HPC Cluster that does not allow us to check VRAM usage during training (mostly because the GPU are shared). Just a small correction: the graphics card is a custom NVIDIA A100-SXM-64GB (https://www.nvidia.com/it-it/data-center/a100/)
Hi,
Thanks for the great model!
It's unclear to me what pipeline did you follow. Based on the above message, it looks like you fine-tuned the llama3-instruct model on raw Italian text, but based on the readme it looks like you actually used Italian instruction data. Then, you fine-tuned with DPO on the English dataset. Is this correct or I am missing something? Thanks!
@antoniox2dos you can find this information in the paper https://arxiv.org/abs/2405.07101
In short (copy-pasting from a recent post of mine):
โ๏ธ The ๐ญ๐ซ๐๐ข๐ง๐ข๐ง๐ ๐ฉ๐ซ๐จ๐๐๐ฌ๐ฌ is quite original and interesting
1๏ธโฃ Built on ๐ฆ Llama-3-8B-Instruct (not a base model)
2๏ธโฃ Fine-tuned on a mix of English instruction datasets (100K prompts, Chat-Error/wizard_alpaca_dolly_orca)
3๏ธโฃ Direct Preference Optimization on Maxime Labonne's orpo-dpo-mix-40k (a good collection of English preference datasets, mainly by Argilla)
4๏ธโฃ ๐ฎ๐น Italian Adaptation: further fine-tuning on 100k examples from clean_mc4_it by Gabriele Sarti
๐ ๏ธ All training steps utilized QLoRA (Quantized Low-Rank Adaptation) with Unsloth AI and Hugging Face TRL.