End of training

67ac077 verified 2 months ago

7.4 kB

	---
	library_name: transformers
	license: apache-2.0
	base_model: alignment-handbook/zephyr-7b-sft-full
	tags:
	- alignment-handbook
	- trl
	- dpo
	- generated_from_trainer
	- trl
	- dpo
	- generated_from_trainer
	datasets:
	- data/zephyr_uf_rlced_conifer_ref_1e2e
	model-index:
	- name: zephyr-7b-uf-rlced-conifer-1e2e-group-dpo-2e
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# zephyr-7b-uf-rlced-conifer-1e2e-group-dpo-2e

	This model is a fine-tuned version of [alignment-handbook/zephyr-7b-sft-full](https://huggingface.co/alignment-handbook/zephyr-7b-sft-full) on the data/zephyr_uf_rlced_conifer_ref_1e2e dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.2626
	- Rewards/chosen: -2.1843
	- Rewards/rejected: -5.4288
	- Rewards/accuracies: 0.8684
	- Rewards/margins: 3.2445
	- Logps/rejected: -946.6157
	- Logps/chosen: -610.9032
	- Logits/rejected: 1.2318
	- Logits/chosen: -0.7806
	- Excess Loss: 0.0374
	- Alpha 0 Uf: 0.8470
	- Alpha 1 Rlced Conifer: 0.1530
	- Rewards/chosen 1 Rlced Conifer: -2.2281
	- Rewards/rejected 1 Rlced Conifer: -6.0246
	- Rewards/accuracies 1 Rlced Conifer: 0.8987
	- Rewards/margins 1 Rlced Conifer: 3.7965
	- Logps/rejected 1 Rlced Conifer: -1049.9939
	- Logps/chosen 1 Rlced Conifer: -646.3860
	- Logits/rejected 1 Rlced Conifer: 1.1158
	- Logits/chosen 1 Rlced Conifer: -0.9982
	- Task Loss 1 Rlced Conifer: 0.2102
	- Task Excess Loss 1 Rlced Conifer: 0.0475
	- Rewards/chosen 0 Uf: -1.9978
	- Rewards/rejected 0 Uf: -3.3091
	- Rewards/accuracies 0 Uf: 0.7603
	- Rewards/margins 0 Uf: 1.3113
	- Logps/rejected 0 Uf: -572.5212
	- Logps/chosen 0 Uf: -489.0419
	- Logits/rejected 0 Uf: 1.8243
	- Logits/chosen 0 Uf: -0.1004
	- Task Loss 0 Uf: 0.4944
	- Task Excess Loss 0 Uf: 0.0469

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 5e-07
	- train_batch_size: 8
	- eval_batch_size: 8
	- seed: 42
	- distributed_type: multi-GPU
	- num_devices: 8
	- gradient_accumulation_steps: 4
	- total_train_batch_size: 256
	- total_eval_batch_size: 64
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: cosine
	- lr_scheduler_warmup_ratio: 0.1
	- num_epochs: 2

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Rewards/chosen \| Rewards/rejected \| Rewards/accuracies \| Rewards/margins \| Logps/rejected \| Logps/chosen \| Logits/rejected \| Logits/chosen \| Excess Loss \| Alpha 0 Uf \| Alpha 1 Rlced Conifer \| Rewards/chosen 1 Rlced Conifer \| Rewards/rejected 1 Rlced Conifer \| Rewards/accuracies 1 Rlced Conifer \| Rewards/margins 1 Rlced Conifer \| Logps/rejected 1 Rlced Conifer \| Logps/chosen 1 Rlced Conifer \| Logits/rejected 1 Rlced Conifer \| Logits/chosen 1 Rlced Conifer \| Task Loss 1 Rlced Conifer \| Task Excess Loss 1 Rlced Conifer \| Rewards/chosen 0 Uf \| Rewards/rejected 0 Uf \| Rewards/accuracies 0 Uf \| Rewards/margins 0 Uf \| Logps/rejected 0 Uf \| Logps/chosen 0 Uf \| Logits/rejected 0 Uf \| Logits/chosen 0 Uf \| Task Loss 0 Uf \| Task Excess Loss 0 Uf \|
	\|:-------------:\|:------:\|:----:\|:---------------:\|:--------------:\|:----------------:\|:------------------:\|:---------------:\|:--------------:\|:------------:\|:---------------:\|:-------------:\|:-----------:\|:----------:\|:---------------------:\|:------------------------------:\|:--------------------------------:\|:----------------------------------:\|:-------------------------------:\|:------------------------------:\|:----------------------------:\|:-------------------------------:\|:-----------------------------:\|:-------------------------:\|:--------------------------------:\|:-------------------:\|:---------------------:\|:-----------------------:\|:--------------------:\|:-------------------:\|:-----------------:\|:--------------------:\|:------------------:\|:--------------:\|:---------------------:\|
	\| 0.1953 \| 0.4997 \| 360 \| 0.3535 \| -1.5938 \| -3.1996 \| 0.8402 \| 1.6058 \| -723.6984 \| -551.8521 \| 0.1112 \| -0.7863 \| 0.1136 \| 0.9694 \| 0.0306 \| -1.5989 \| -3.4179 \| 0.8677 \| 1.8190 \| -789.3262 \| -583.4747 \| -0.1145 \| -0.9516 \| 0.3087 \| 0.1414 \| -1.5520 \| -2.3972 \| 0.7448 \| 0.8452 \| -481.3242 \| -444.4588 \| 1.0137 \| -0.2527 \| 0.5289 \| 0.0768 \|
	\| 0.1537 \| 0.9993 \| 720 \| 0.3329 \| -1.4289 \| -3.2979 \| 0.8609 \| 1.8690 \| -733.5210 \| -535.3586 \| 0.6830 \| -0.5276 \| 0.0943 \| 0.9852 \| 0.0148 \| -1.4038 \| -3.4887 \| 0.8869 \| 2.0849 \| -796.4048 \| -563.9600 \| 0.3914 \| -0.7372 \| 0.2955 \| 0.1278 \| -1.4972 \| -2.5982 \| 0.7618 \| 1.1009 \| -501.4233 \| -438.9818 \| 1.8477 \| 0.1514 \| 0.4804 \| 0.0530 \|
	\| 0.0667 \| 1.4990 \| 1080 \| 0.2667 \| -2.1402 \| -5.1839 \| 0.8656 \| 3.0437 \| -922.1221 \| -606.4852 \| 1.0002 \| -0.7884 \| 0.0408 \| 0.8954 \| 0.1046 \| -2.1729 \| -5.7323 \| 0.8964 \| 3.5594 \| -1020.7665 \| -640.8754 \| 0.8903 \| -0.9784 \| 0.2150 \| 0.0521 \| -1.9916 \| -3.2293 \| 0.7574 \| 1.2377 \| -564.5363 \| -488.4239 \| 1.5582 \| -0.1961 \| 0.4940 \| 0.0466 \|
	\| 0.06 \| 1.9986 \| 1440 \| 0.2626 \| -2.1843 \| -5.4288 \| 0.8684 \| 3.2445 \| -946.6157 \| -610.9032 \| 1.2318 \| -0.7806 \| 0.0374 \| 0.8470 \| 0.1530 \| -2.2281 \| -6.0246 \| 0.8987 \| 3.7965 \| -1049.9939 \| -646.3860 \| 1.1158 \| -0.9982 \| 0.2102 \| 0.0475 \| -1.9978 \| -3.3091 \| 0.7603 \| 1.3113 \| -572.5212 \| -489.0419 \| 1.8243 \| -0.1004 \| 0.4944 \| 0.0469 \|


	### Framework versions

	- Transformers 4.44.1
	- Pytorch 2.1.2+cu121
	- Datasets 2.21.0
	- Tokenizers 0.19.1