MathiasBrussow/shawgpt-ft

Browse files

Files changed (3) hide show

README.md +37 -25
runs/Oct24_09-40-20_c9e665a1658b/events.out.tfevents.1729762824.c9e665a1658b.1073.0 +3 -0
training_args.bin +1 -1

README.md CHANGED Viewed

@@ -16,7 +16,7 @@ should probably proofread and complete it, then remove this comment. -->
 This model is a fine-tuned version of [TheBloke/Mistral-7B-Instruct-v0.2-GPTQ](https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.2-GPTQ) on an unknown dataset.
 It achieves the following results on the evaluation set:
-- Loss: 1.9102
 ## Model description
@@ -35,42 +35,54 @@ More information needed
 ### Training hyperparameters
 The following hyperparameters were used during training:
-- learning_rate: 0.0002
-- train_batch_size: 24
-- eval_batch_size: 24
 - seed: 42
-- gradient_accumulation_steps: 4
-- total_train_batch_size: 96
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: linear
-- lr_scheduler_warmup_steps: 2
-- num_epochs: 20
 - mixed_precision_training: Native AMP
 ### Training results
 | Training Loss | Epoch   | Step | Validation Loss |
 |:-------------:|:-------:|:----:|:---------------:|
-| 2.5215        | 1.0     | 1    | 3.3144          |
-| 1.1816        | 2.0     | 3    | 3.0479          |
-| 1.0977        | 3.0     | 5    | 2.7926          |
-| 2.0195        | 4.0     | 6    | 2.6715          |
-| 2.0273        | 5.0     | 7    | 2.5671          |
-| 0.8926        | 6.0     | 9    | 2.3771          |
-| 0.8271        | 7.0     | 11   | 2.2092          |
-| 1.5869        | 8.0     | 12   | 2.1461          |
-| 1.5469        | 9.0     | 13   | 2.0923          |
-| 0.709         | 10.0    | 15   | 2.0085          |
-| 0.75          | 11.0    | 17   | 1.9511          |
-| 1.3555        | 12.0    | 18   | 1.9306          |
-| 1.3994        | 13.0    | 19   | 1.9170          |
-| 0.4607        | 13.3333 | 20   | 1.9102          |
 ### Framework versions
-- PEFT 0.13.0
 - Transformers 4.44.2
-- Pytorch 2.4.1+cu121
-- Datasets 3.0.1
 - Tokenizers 0.19.1

 This model is a fine-tuned version of [TheBloke/Mistral-7B-Instruct-v0.2-GPTQ](https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.2-GPTQ) on an unknown dataset.
 It achieves the following results on the evaluation set:
+- Loss: 1.3327
 ## Model description
 ### Training hyperparameters
 The following hyperparameters were used during training:
+- learning_rate: 0.0001
+- train_batch_size: 8
+- eval_batch_size: 8
 - seed: 42
+- gradient_accumulation_steps: 2
+- total_train_batch_size: 16
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: linear
+- lr_scheduler_warmup_steps: 5
+- num_epochs: 30
 - mixed_precision_training: Native AMP
 ### Training results
 | Training Loss | Epoch   | Step | Validation Loss |
 |:-------------:|:-------:|:----:|:---------------:|
+| 4.9892        | 0.8571  | 3    | 4.1652          |
+| 3.4756        | 2.0     | 7    | 3.7140          |
+| 4.1288        | 2.8571  | 10   | 3.3485          |
+| 2.74          | 4.0     | 14   | 2.9038          |
+| 3.2574        | 4.8571  | 17   | 2.6228          |
+| 2.1252        | 6.0     | 21   | 2.2779          |
+| 2.4458        | 6.8571  | 24   | 2.0512          |
+| 1.6206        | 8.0     | 28   | 1.7919          |
+| 1.9223        | 8.8571  | 31   | 1.6560          |
+| 1.2633        | 10.0    | 35   | 1.5061          |
+| 1.6093        | 10.8571 | 38   | 1.4323          |
+| 1.0947        | 12.0    | 42   | 1.3917          |
+| 1.4763        | 12.8571 | 45   | 1.3740          |
+| 1.0425        | 14.0    | 49   | 1.3550          |
+| 1.4106        | 14.8571 | 52   | 1.3455          |
+| 1.0377        | 16.0    | 56   | 1.3378          |
+| 1.3563        | 16.8571 | 59   | 1.3361          |
+| 0.9793        | 18.0    | 63   | 1.3342          |
+| 1.3045        | 18.8571 | 66   | 1.3328          |
+| 0.942         | 20.0    | 70   | 1.3332          |
+| 1.3308        | 20.8571 | 73   | 1.3338          |
+| 0.9164        | 22.0    | 77   | 1.3343          |
+| 1.2619        | 22.8571 | 80   | 1.3345          |
+| 0.9497        | 24.0    | 84   | 1.3335          |
+| 1.2863        | 24.8571 | 87   | 1.3328          |
+| 0.8351        | 25.7143 | 90   | 1.3327          |
 ### Framework versions
+- PEFT 0.13.2
 - Transformers 4.44.2
+- Pytorch 2.5.0+cu121
+- Datasets 3.0.2
 - Tokenizers 0.19.1

runs/Oct24_09-40-20_c9e665a1658b/events.out.tfevents.1729762824.c9e665a1658b.1073.0 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:cc11234feb8b51d0b590ca3eea0b29abcb291a391d3bf9702737149be27441b9
+size 18228

training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:a9ffa354f30eadd1db67eadc41f82f0a574f1c9dc58abe00cbd9546b5444cc8a
 size 5176

 version https://git-lfs.github.com/spec/v1
+oid sha256:e4364efe670a0f15277002a359873c1eb08fa4beb6e4cd35994eb0094db85793
 size 5176