--- base_model: mistralai/Mistral-7B-v0.1 datasets: - HuggingFaceH4/ultrafeedback_binarized library_name: peft tags: - alignment-handbook - trl - dpo - generated_from_trainer model-index: - name: Mistral-7B-base-simpo-qlora results: [] --- [Visualize in Weights & Biases](https://wandb.ai/nlp-xiaobo/huggingface/runs/fgjarr1f) # Mistral-7B-base-simpo-qlora This model is a fine-tuned version of [alignment-handbook/zephyr-7b-sft-qlora](https://huggingface.co/alignment-handbook/zephyr-7b-sft-qlora) on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set: - Loss: 1.5543 - Rewards/chosen: -2.0201 - Rewards/rejected: -2.5529 - Rewards/accuracies: 0.6215 - Rewards/margins: 0.5328 - Logps/rejected: -1.2765 - Logps/chosen: -1.0100 - Logits/rejected: -2.1352 - Logits/chosen: -2.2380 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 3e-07 - train_batch_size: 2 - eval_batch_size: 4 - seed: 42 - gradient_accumulation_steps: 8 - total_train_batch_size: 16 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_ratio: 0.1 - num_epochs: 1 ### Training results | Training Loss | Epoch | Step | Logits/chosen | Logits/rejected | Logps/chosen | Logps/rejected | Validation Loss | Rewards/accuracies | Rewards/chosen | Rewards/margins | Rewards/rejected | |:-------------:|:------:|:----:|:-------------:|:---------------:|:------------:|:--------------:|:---------------:|:------------------:|:--------------:|:---------------:|:----------------:| | 1.6117 | 0.1047 | 400 | -2.2513 | -2.1455 | -0.9526 | -1.1212 | 1.6171 | 0.6010 | -1.9052 | 0.3373 | -2.2425 | | 1.5829 | 0.2094 | 800 | -2.2393 | -2.1341 | -0.9938 | -1.2007 | 1.5888 | 0.6160 | -1.9876 | 0.4139 | -2.4015 | | 1.5829 | 0.3141 | 1200 | -2.2356 | -2.1315 | -0.9915 | -1.2316 | 1.5656 | 0.6235 | -1.9830 | 0.4802 | -2.4632 | | 1.6544 | 0.4187 | 1600 | -2.2392 | -2.1362 | -1.0204 | -1.2795 | 1.5601 | 0.6205 | -2.0408 | 0.5182 | -2.5590 | | 1.4432 | 0.5234 | 2000 | -2.2398 | -2.1370 | -1.0143 | -1.2770 | 1.5560 | 0.6215 | -2.0287 | 0.5254 | -2.5541 | | 1.5835 | 0.6281 | 2400 | -2.2387 | -2.1360 | -1.0393 | -1.3078 | 1.5582 | 0.6215 | -2.0787 | 0.5369 | -2.6156 | | 1.5021 | 0.7328 | 2800 | -2.2395 | -2.1368 | -1.0048 | -1.2707 | 1.5540 | 0.6235 | -2.0096 | 0.5317 | -2.5414 | | 1.6684 | 0.8375 | 3200 | -2.2405 | -2.1379 | -1.0095 | -1.2763 | 1.5542 | 0.6215 | -2.0191 | 0.5334 | -2.5525 | | 1.5034 | 0.9422 | 3600 | -2.2372 | -2.1342 | -1.0110 | -1.2775 | 1.5546 | 0.6210 | -2.0219 | 0.5331 | -2.5550 | ### Framework versions - PEFT 0.11.1 - Transformers 4.42.2 - Pytorch 2.3.1+cu121 - Datasets 2.20.0 - Tokenizers 0.19.1