shakhizat
/

flash_attention_fine-tuned_mistral

Generated from Trainer

Model card Files Files and versions Community

shakhizat commited on Jun 3, 2024

Commit

09d0ebf

·

verified ·

1 Parent(s): 72f1ff5

Model save

Files changed (1) hide show

README.md +15 -15

README.md CHANGED Viewed

@@ -18,7 +18,7 @@ should probably proofread and complete it, then remove this comment. -->
 This model is a fine-tuned version of [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) on an unknown dataset.
 It achieves the following results on the evaluation set:
-- Loss: 1.1622
 ## Model description
@@ -38,11 +38,11 @@ More information needed
 The following hyperparameters were used during training:
 - learning_rate: 0.0002
-- train_batch_size: 4
-- eval_batch_size: 4
 - seed: 42
-- gradient_accumulation_steps: 6
-- total_train_batch_size: 24
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: linear
 - lr_scheduler_warmup_ratio: 0.01
@@ -52,16 +52,16 @@ The following hyperparameters were used during training:
 | Training Loss | Epoch  | Step | Validation Loss |
 |:-------------:|:------:|:----:|:---------------:|
-| No log        | 0.0244 | 10   | 1.2046          |
-| No log        | 0.0487 | 20   | 1.1875          |
-| 1.1978        | 0.0731 | 30   | 1.1797          |
-| 1.1978        | 0.0975 | 40   | 1.1763          |
-| 1.1462        | 0.1219 | 50   | 1.1736          |
-| 1.1462        | 0.1462 | 60   | 1.1712          |
-| 1.1462        | 0.1706 | 70   | 1.1701          |
-| 1.137         | 0.1950 | 80   | 1.1681          |
-| 1.137         | 0.2193 | 90   | 1.1645          |
-| 1.1484        | 0.2437 | 100  | 1.1622          |
 ### Framework versions

 This model is a fine-tuned version of [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) on an unknown dataset.
 It achieves the following results on the evaluation set:
+- Loss: 1.1909
 ## Model description
 The following hyperparameters were used during training:
 - learning_rate: 0.0002
+- train_batch_size: 1
+- eval_batch_size: 2
 - seed: 42
+- gradient_accumulation_steps: 2
+- total_train_batch_size: 2
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: linear
 - lr_scheduler_warmup_ratio: 0.01
 | Training Loss | Epoch  | Step | Validation Loss |
 |:-------------:|:------:|:----:|:---------------:|
+| No log        | 0.0020 | 10   | 1.2457          |
+| No log        | 0.0041 | 20   | 1.2098          |
+| 1.3368        | 0.0061 | 30   | 1.2134          |
+| 1.3368        | 0.0081 | 40   | 1.2185          |
+| 1.2308        | 0.0102 | 50   | 1.2187          |
+| 1.2308        | 0.0122 | 60   | 1.2190          |
+| 1.2308        | 0.0142 | 70   | 1.2074          |
+| 1.2921        | 0.0163 | 80   | 1.2018          |
+| 1.2921        | 0.0183 | 90   | 1.1937          |
+| 1.2655        | 0.0203 | 100  | 1.1909          |
 ### Framework versions