End of training

0387d47 verified 2 months ago

7.41 kB

	---
	library_name: transformers
	license: apache-2.0
	base_model: alignment-handbook/zephyr-7b-sft-full
	tags:
	- alignment-handbook
	- trl
	- dpo
	- generated_from_trainer
	- trl
	- dpo
	- generated_from_trainer
	datasets:
	- data/zephyr_uf_rlced_conifer_ref
	model-index:
	- name: zephyr-7b-uf-rlced-conifer-group-dpo-2e-alr-0.01
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# zephyr-7b-uf-rlced-conifer-group-dpo-2e-alr-0.01

	This model is a fine-tuned version of [alignment-handbook/zephyr-7b-sft-full](https://huggingface.co/alignment-handbook/zephyr-7b-sft-full) on the data/zephyr_uf_rlced_conifer_ref dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.2395
	- Rewards/chosen: -2.8511
	- Rewards/rejected: -8.5888
	- Rewards/accuracies: 0.8778
	- Rewards/margins: 5.7377
	- Logps/rejected: -1262.6172
	- Logps/chosen: -677.5837
	- Logits/rejected: 3.8778
	- Logits/chosen: 1.9376
	- Excess Loss: 0.0374
	- Alpha 0 Uf: 0.5116
	- Alpha 1 Rlced Conifer: 0.4884
	- Rewards/chosen 1 Rlced Conifer: -3.0535
	- Rewards/rejected 1 Rlced Conifer: -10.0348
	- Rewards/accuracies 1 Rlced Conifer: 0.9097
	- Rewards/margins 1 Rlced Conifer: 6.9812
	- Logps/rejected 1 Rlced Conifer: -1451.0132
	- Logps/chosen 1 Rlced Conifer: -728.9337
	- Logits/rejected 1 Rlced Conifer: 3.5676
	- Logits/chosen 1 Rlced Conifer: 1.5730
	- Task Loss 1 Rlced Conifer: 0.1787
	- Task Excess Loss 1 Rlced Conifer: 0.0427
	- Rewards/chosen 0 Uf: -2.0820
	- Rewards/rejected 0 Uf: -3.4336
	- Rewards/accuracies 0 Uf: 0.7633
	- Rewards/margins 0 Uf: 1.3516
	- Logps/rejected 0 Uf: -584.9677
	- Logps/chosen 0 Uf: -497.4562
	- Logits/rejected 0 Uf: 5.1753
	- Logits/chosen 0 Uf: 3.1000
	- Task Loss 0 Uf: 0.5185
	- Task Excess Loss 0 Uf: 0.0724

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 5e-07
	- train_batch_size: 8
	- eval_batch_size: 8
	- seed: 42
	- distributed_type: multi-GPU
	- num_devices: 8
	- gradient_accumulation_steps: 4
	- total_train_batch_size: 256
	- total_eval_batch_size: 64
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: cosine
	- lr_scheduler_warmup_ratio: 0.1
	- num_epochs: 2

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Rewards/chosen \| Rewards/rejected \| Rewards/accuracies \| Rewards/margins \| Logps/rejected \| Logps/chosen \| Logits/rejected \| Logits/chosen \| Excess Loss \| Alpha 0 Uf \| Alpha 1 Rlced Conifer \| Rewards/chosen 1 Rlced Conifer \| Rewards/rejected 1 Rlced Conifer \| Rewards/accuracies 1 Rlced Conifer \| Rewards/margins 1 Rlced Conifer \| Logps/rejected 1 Rlced Conifer \| Logps/chosen 1 Rlced Conifer \| Logits/rejected 1 Rlced Conifer \| Logits/chosen 1 Rlced Conifer \| Task Loss 1 Rlced Conifer \| Task Excess Loss 1 Rlced Conifer \| Rewards/chosen 0 Uf \| Rewards/rejected 0 Uf \| Rewards/accuracies 0 Uf \| Rewards/margins 0 Uf \| Logps/rejected 0 Uf \| Logps/chosen 0 Uf \| Logits/rejected 0 Uf \| Logits/chosen 0 Uf \| Task Loss 0 Uf \| Task Excess Loss 0 Uf \|
	\|:-------------:\|:------:\|:----:\|:---------------:\|:--------------:\|:----------------:\|:------------------:\|:---------------:\|:--------------:\|:------------:\|:---------------:\|:-------------:\|:-----------:\|:----------:\|:---------------------:\|:------------------------------:\|:--------------------------------:\|:----------------------------------:\|:-------------------------------:\|:------------------------------:\|:----------------------------:\|:-------------------------------:\|:-----------------------------:\|:-------------------------:\|:--------------------------------:\|:-------------------:\|:---------------------:\|:-----------------------:\|:--------------------:\|:-------------------:\|:-----------------:\|:--------------------:\|:------------------:\|:--------------:\|:---------------------:\|
	\| 0.1689 \| 0.4997 \| 360 \| 0.2674 \| -2.2066 \| -5.7976 \| 0.8656 \| 3.5910 \| -983.4942 \| -613.1316 \| 1.9639 \| 0.4895 \| 0.0642 \| 0.5765 \| 0.4235 \| -2.3017 \| -6.6520 \| 0.8965 \| 4.3503 \| -1112.7397 \| -653.7553 \| 1.7066 \| 0.1879 \| 0.2091 \| 0.0748 \| -1.8461 \| -2.7792 \| 0.7426 \| 0.9330 \| -519.5245 \| -473.8738 \| 3.0556 \| 1.4702 \| 0.5392 \| 0.0891 \|
	\| 0.1413 \| 0.9993 \| 720 \| 0.2485 \| -2.0138 \| -6.1196 \| 0.8741 \| 4.1059 \| -1015.6987 \| -593.8471 \| 2.5252 \| 1.3345 \| 0.0465 \| 0.6417 \| 0.3583 \| -2.0972 \| -7.0507 \| 0.9047 \| 4.9535 \| -1152.6036 \| -633.2974 \| 2.1536 \| 1.0120 \| 0.1925 \| 0.0584 \| -1.6822 \| -2.7943 \| 0.7670 \| 1.1121 \| -521.0374 \| -457.4840 \| 4.0168 \| 2.3771 \| 0.4989 \| 0.0595 \|
	\| 0.0671 \| 1.4990 \| 1080 \| 0.2408 \| -2.5432 \| -7.7524 \| 0.8741 \| 5.2092 \| -1178.9786 \| -646.7894 \| 3.9871 \| 2.3348 \| 0.0389 \| 0.5284 \| 0.4716 \| -2.6717 \| -8.9931 \| 0.9071 \| 6.3215 \| -1346.8500 \| -690.7497 \| 3.5948 \| 1.9516 \| 0.1822 \| 0.0462 \| -2.0401 \| -3.3250 \| 0.7500 \| 1.2849 \| -574.1076 \| -493.2740 \| 5.5773 \| 3.5557 \| 0.5197 \| 0.0655 \|
	\| 0.0649 \| 1.9986 \| 1440 \| 0.2395 \| -2.8511 \| -8.5888 \| 0.8778 \| 5.7377 \| -1262.6172 \| -677.5837 \| 3.8778 \| 1.9376 \| 0.0374 \| 0.5116 \| 0.4884 \| -3.0535 \| -10.0348 \| 0.9097 \| 6.9812 \| -1451.0132 \| -728.9337 \| 3.5676 \| 1.5730 \| 0.1787 \| 0.0427 \| -2.0820 \| -3.4336 \| 0.7633 \| 1.3516 \| -584.9677 \| -497.4562 \| 5.1753 \| 3.1000 \| 0.5185 \| 0.0724 \|


	### Framework versions

	- Transformers 4.44.2
	- Pytorch 2.2.0a0+81ea7a4
	- Datasets 2.21.0
	- Tokenizers 0.19.1