Yofuria
/

Mistral-7B-base-simpo-qlora

alignment-handbook

Generated from Trainer

4-bit precision

Model card Files Files and versions Community

Yofuria commited on Jul 4, 2024

Commit

f6631a2

·

verified ·

1 Parent(s): e9d0fd9

Update README.md

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -19,7 +19,7 @@ should probably proofread and complete it, then remove this comment. -->
 [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://wandb.ai/nlp-xiaobo/huggingface/runs/fgjarr1f)
 # Mistral-7B-base-simpo-qlora
-This model is a fine-tuned version of [/home/yofuria/PLM/SimPO/zephyr-7b-sft-qlora](https://huggingface.co//home/yofuria/PLM/SimPO/zephyr-7b-sft-qlora) on the HuggingFaceH4/ultrafeedback_binarized dataset.
 It achieves the following results on the evaluation set:
 - Loss: 1.5543
 - Rewards/chosen: -2.0201
@@ -49,11 +49,11 @@ More information needed
 The following hyperparameters were used during training:
 - learning_rate: 3e-07
-- train_batch_size: 8
 - eval_batch_size: 4
 - seed: 42
 - gradient_accumulation_steps: 8
-- total_train_batch_size: 64
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: cosine
 - lr_scheduler_warmup_ratio: 0.1

 [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://wandb.ai/nlp-xiaobo/huggingface/runs/fgjarr1f)
 # Mistral-7B-base-simpo-qlora
+This model is a fine-tuned version of [alignment-handbook/zephyr-7b-sft-qlora](https://huggingface.co/alignment-handbook/zephyr-7b-sft-qlora) on the HuggingFaceH4/ultrafeedback_binarized dataset.
 It achieves the following results on the evaluation set:
 - Loss: 1.5543
 - Rewards/chosen: -2.0201
 The following hyperparameters were used during training:
 - learning_rate: 3e-07
+- train_batch_size: 2
 - eval_batch_size: 4
 - seed: 42
 - gradient_accumulation_steps: 8
+- total_train_batch_size: 16
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: cosine
 - lr_scheduler_warmup_ratio: 0.1