shawgpt-ft-epoch-17

This model is a fine-tuned version of TheBloke/Mistral-7B-Instruct-v0.2-GPTQ on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 2.6155

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0002
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 32
  • optimizer: Use OptimizerNames.PAGED_ADAMW_8BIT with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 2
  • num_epochs: 17
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
25.5434 0.5714 1 4.2401
25.6938 1.5714 2 4.1565
24.646 2.5714 3 3.9657
23.5063 3.5714 4 3.7821
22.2803 4.5714 5 3.6134
21.3242 5.5714 6 3.4549
20.3798 6.5714 7 3.3075
19.658 7.5714 8 3.1749
18.9316 8.5714 9 3.0579
18.1952 9.5714 10 2.9563
17.5537 10.5714 11 2.8690
17.0554 11.5714 12 2.7957
16.6773 12.5714 13 2.7354
16.3041 13.5714 14 2.6879
15.9872 14.5714 15 2.6520
15.7942 15.5714 16 2.6279
10.5046 16.5714 17 2.6155

Framework versions

  • PEFT 0.14.0
  • Transformers 4.47.1
  • Pytorch 2.5.1+cu121
  • Datasets 3.3.1
  • Tokenizers 0.21.0
Downloads last month
15
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no pipeline_tag.

Model tree for Jonasbukhave/shawgpt-ft-epoch-17