Qwen2.5-0.5B-Instruct-ITA

This model is a fine-tuned version of Qwen/Qwen2.5-0.5B-Instruct on the ReDiX/DataForge dataset. It achieves the following results on the evaluation set:

  • Loss: 1.4100

Model description

This model is an example of finetuning a sLLM. Italian eval improved and the model learned as espected from the training data

Intended uses & limitations

More information needed

Training and evaluation data

Tasks Version Filter n-shot Metric Value Stderr
arc_it 2 none 0 acc ↑ 0.2378 ± 0.0125
none 0 acc_norm ↑ 0.2823 ± 0.0132
hellaswag_it 1 none 0 acc ↑ 0.3163 ± 0.0049
none 0 acc_norm ↑ 0.3800 ± 0.0051
m_mmlu_it 0 none 5 acc ↑ 0.381 ± 0.0042

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 16
  • optimizer: Use adamw_bnb_8bit with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 10
  • num_epochs: 2

Built with Axolotl

See axolotl config

axolotl version: 0.5.0

base_model: Qwen/Qwen2.5-0.5B-Instruct

load_in_8bit: false
load_in_4bit: false
strict: false

datasets:
  - path: ./dataforge
    type: chat_template

    field_messages: conversations
    message_field_role: from
    message_field_content: value

# chat_template: chatml
dataset_prepared_path: last_run_prepared
val_set_size: 0.1
output_dir: ./outputs/qwen05B

unfrozen_parameters:
- ^lm_head.weight$
- ^model.embed_tokens.weight$
# mlp.down_proj layers
- model.layers.0.mlp.down_proj
- model.layers.23.mlp.down_proj
- model.layers.1.mlp.down_proj
- model.layers.16.mlp.down_proj
- model.layers.4.mlp.down_proj
- model.layers.17.mlp.down_proj
# mlp.gate_proj layers
- model.layers.0.mlp.gate_proj
- model.layers.1.mlp.gate_proj
- model.layers.2.mlp.gate_proj
- model.layers.3.mlp.gate_proj
- model.layers.4.mlp.gate_proj
- model.layers.7.mlp.gate_proj
# mlp.up_proj layers
- model.layers.1.mlp.up_proj
- model.layers.0.mlp.up_proj
- model.layers.3.mlp.up_proj
- model.layers.4.mlp.up_proj
- model.layers.7.mlp.up_proj
- model.layers.9.mlp.up_proj
# self_attn.k_proj layers
- model.layers.18.self_attn.k_proj
- model.layers.7.self_attn.k_proj
- model.layers.19.self_attn.k_proj
- model.layers.2.self_attn.k_proj
- model.layers.6.self_attn.k_proj
- model.layers.9.self_attn.k_proj
# self_attn.o_proj layers
- model.layers.16.self_attn.o_proj
- model.layers.19.self_attn.o_proj
- model.layers.0.self_attn.o_proj
- model.layers.20.self_attn.o_proj
- model.layers.4.self_attn.o_proj
- model.layers.3.self_attn.o_proj
# self_attn.q_proj layers
- model.layers.13.self_attn.q_proj
- model.layers.16.self_attn.q_proj
- model.layers.21.self_attn.q_proj
- model.layers.11.self_attn.q_proj
- model.layers.15.self_attn.q_proj
- model.layers.6.self_attn.q_proj
# self_attn.v_proj layers
- model.layers.2.self_attn.v_proj
- model.layers.3.self_attn.v_proj
- model.layers.4.self_attn.v_proj
- model.layers.5.self_attn.v_proj
- model.layers.7.self_attn.v_proj
- model.layers.8.self_attn.v_proj



sequence_len: 4096
sample_packing: true
eval_sample_packing: true
pad_to_sequence_len: true


wandb_project: axolotl
wandb_entity:
wandb_watch:
wandb_name: qwen2.5-0.5B
wandb_log_model:

gradient_accumulation_steps: 4
micro_batch_size: 4
num_epochs: 2
optimizer: adamw_bnb_8bit
lr_scheduler: cosine
learning_rate: 1.0e-04

train_on_inputs: false
group_by_length: false
bf16: true
fp16: 
tf32: false

gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 5
xformers_attention:
flash_attention: true


warmup_steps: 10
evals_per_epoch: 4
eval_table_size:
eval_max_new_tokens: 128
saves_per_epoch: 1
debug:
deepspeed:
weight_decay: 0.0
fsdp:
fsdp_config:
special_tokens:
  pad_token: "<|im_end|>"
  eos_token: "<|im_end|>"


Training results

Training Loss Epoch Step Validation Loss
No log 0.0013 1 1.7855
1.2567 0.2504 194 1.5639
1.2551 0.5008 388 1.4980
1.1845 0.7512 582 1.4501
1.3178 1.0019 776 1.4252
1.06 1.2523 970 1.4187
1.0697 1.5027 1164 1.4116
1.0362 1.7531 1358 1.4100

Framework versions

  • Transformers 4.46.2
  • Pytorch 2.5.1+cu124
  • Datasets 3.1.0
  • Tokenizers 0.20.3
Downloads last month
2,936
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for ReDiX/Qwen2.5-0.5B-Instruct-ITA

Base model

Qwen/Qwen2.5-0.5B
Finetuned
(153)
this model
Quantizations
4 models

Datasets used to train ReDiX/Qwen2.5-0.5B-Instruct-ITA

Collection including ReDiX/Qwen2.5-0.5B-Instruct-ITA