NicholasCorrado
/

zephyr-7b-uf-rlced-conifer-1e2e-group-dpo-2e

+---
+library_name: transformers
+license: apache-2.0
+base_model: alignment-handbook/zephyr-7b-sft-full
+tags:
+- trl
+- dpo
+- generated_from_trainer
+model-index:
+- name: zephyr-7b-uf-rlced-conifer-1e2e-group-dpo-2e
+  results: []
+---
+<!-- This model card has been generated automatically according to the information the Trainer had access to. You
+should probably proofread and complete it, then remove this comment. -->
+# zephyr-7b-uf-rlced-conifer-1e2e-group-dpo-2e
+This model is a fine-tuned version of [alignment-handbook/zephyr-7b-sft-full](https://huggingface.co/alignment-handbook/zephyr-7b-sft-full) on an unknown dataset.
+It achieves the following results on the evaluation set:
+- Loss: 0.2626
+- Rewards/chosen: -2.1843
+- Rewards/rejected: -5.4288
+- Rewards/accuracies: 0.8684
+- Rewards/margins: 3.2445
+- Logps/rejected: -946.6157
+- Logps/chosen: -610.9032
+- Logits/rejected: 1.2318
+- Logits/chosen: -0.7806
+- Excess Loss: 0.0374
+- Alpha 0 Uf: 0.8470
+- Alpha 1 Rlced Conifer: 0.1530
+- Rewards/chosen 1 Rlced Conifer: -2.2281
+- Rewards/rejected 1 Rlced Conifer: -6.0246
+- Rewards/accuracies 1 Rlced Conifer: 0.8987
+- Rewards/margins 1 Rlced Conifer: 3.7965
+- Logps/rejected 1 Rlced Conifer: -1049.9939
+- Logps/chosen 1 Rlced Conifer: -646.3860
+- Logits/rejected 1 Rlced Conifer: 1.1158
+- Logits/chosen 1 Rlced Conifer: -0.9982
+- Task Loss 1 Rlced Conifer: 0.2102
+- Task Excess Loss 1 Rlced Conifer: 0.0475
+- Rewards/chosen 0 Uf: -1.9978
+- Rewards/rejected 0 Uf: -3.3091
+- Rewards/accuracies 0 Uf: 0.7603
+- Rewards/margins 0 Uf: 1.3113
+- Logps/rejected 0 Uf: -572.5212
+- Logps/chosen 0 Uf: -489.0419
+- Logits/rejected 0 Uf: 1.8243
+- Logits/chosen 0 Uf: -0.1004
+- Task Loss 0 Uf: 0.4944
+- Task Excess Loss 0 Uf: 0.0469
+## Model description
+More information needed
+## Intended uses & limitations
+More information needed
+## Training and evaluation data
+More information needed
+## Training procedure
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 5e-07
+- train_batch_size: 8
+- eval_batch_size: 8
+- seed: 42
+- distributed_type: multi-GPU
+- num_devices: 8
+- gradient_accumulation_steps: 4
+- total_train_batch_size: 256
+- total_eval_batch_size: 64
+- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
+- lr_scheduler_type: cosine
+- lr_scheduler_warmup_ratio: 0.1
+- num_epochs: 2
+### Training results
+| Training Loss | Epoch  | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | Excess Loss | Alpha 0 Uf | Alpha 1 Rlced Conifer | Rewards/chosen 1 Rlced Conifer | Rewards/rejected 1 Rlced Conifer | Rewards/accuracies 1 Rlced Conifer | Rewards/margins 1 Rlced Conifer | Logps/rejected 1 Rlced Conifer | Logps/chosen 1 Rlced Conifer | Logits/rejected 1 Rlced Conifer | Logits/chosen 1 Rlced Conifer | Task Loss 1 Rlced Conifer | Task Excess Loss 1 Rlced Conifer | Rewards/chosen 0 Uf | Rewards/rejected 0 Uf | Rewards/accuracies 0 Uf | Rewards/margins 0 Uf | Logps/rejected 0 Uf | Logps/chosen 0 Uf | Logits/rejected 0 Uf | Logits/chosen 0 Uf | Task Loss 0 Uf | Task Excess Loss 0 Uf |
+|:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|:-----------:|:----------:|:---------------------:|:------------------------------:|:--------------------------------:|:----------------------------------:|:-------------------------------:|:------------------------------:|:----------------------------:|:-------------------------------:|:-----------------------------:|:-------------------------:|:--------------------------------:|:-------------------:|:---------------------:|:-----------------------:|:--------------------:|:-------------------:|:-----------------:|:--------------------:|:------------------:|:--------------:|:---------------------:|
+| 0.1953        | 0.4997 | 360  | 0.3535          | -1.5938        | -3.1996          | 0.8402             | 1.6058          | -723.6984      | -551.8521    | 0.1112          | -0.7863       | 0.1136      | 0.9694     | 0.0306                | -1.5989                        | -3.4179                          | 0.8677                             | 1.8190                          | -789.3262                      | -583.4747                    | -0.1145                         | -0.9516                       | 0.3087                    | 0.1414                           | -1.5520             | -2.3972               | 0.7448                  | 0.8452               | -481.3242           | -444.4588         | 1.0137               | -0.2527            | 0.5289         | 0.0768                |
+| 0.1537        | 0.9993 | 720  | 0.3329          | -1.4289        | -3.2979          | 0.8609             | 1.8690          | -733.5210      | -535.3586    | 0.6830          | -0.5276       | 0.0943      | 0.9852     | 0.0148                | -1.4038                        | -3.4887                          | 0.8869                             | 2.0849                          | -796.4048                      | -563.9600                    | 0.3914                          | -0.7372                       | 0.2955                    | 0.1278                           | -1.4972             | -2.5982               | 0.7618                  | 1.1009               | -501.4233           | -438.9818         | 1.8477               | 0.1514             | 0.4804         | 0.0530                |
+| 0.0667        | 1.4990 | 1080 | 0.2667          | -2.1402        | -5.1839          | 0.8656             | 3.0437          | -922.1221      | -606.4852    | 1.0002          | -0.7884       | 0.0408      | 0.8954     | 0.1046                | -2.1729                        | -5.7323                          | 0.8964                             | 3.5594                          | -1020.7665                     | -640.8754                    | 0.8903                          | -0.9784                       | 0.2150                    | 0.0521                           | -1.9916             | -3.2293               | 0.7574                  | 1.2377               | -564.5363           | -488.4239         | 1.5582               | -0.1961            | 0.4940         | 0.0466                |
+| 0.06          | 1.9986 | 1440 | 0.2626          | -2.1843        | -5.4288          | 0.8684             | 3.2445          | -946.6157      | -610.9032    | 1.2318          | -0.7806       | 0.0374      | 0.8470     | 0.1530                | -2.2281                        | -6.0246                          | 0.8987                             | 3.7965                          | -1049.9939                     | -646.3860                    | 1.1158                          | -0.9982                       | 0.2102                    | 0.0475                           | -1.9978             | -3.3091               | 0.7603                  | 1.3113               | -572.5212           | -489.0419         | 1.8243               | -0.1004            | 0.4944         | 0.0469                |
+### Framework versions
+- Transformers 4.44.1
+- Pytorch 2.1.2+cu121
+- Datasets 2.21.0
+- Tokenizers 0.19.1

all_results.json ADDED Viewed

	@@ -0,0 +1,9 @@

+{
+    "epoch": 1.9986120749479528,
+    "total_flos": 0.0,
+    "train_loss": 0.15370953861210082,
+    "train_runtime": 41916.6173,
+    "train_samples": 184443,
+    "train_samples_per_second": 8.8,
+    "train_steps_per_second": 0.034
+}

generation_config.json ADDED Viewed

	@@ -0,0 +1,6 @@

+{
+  "_from_model_config": true,
+  "bos_token_id": 1,
+  "eos_token_id": 2,
+  "transformers_version": "4.44.1"
+}

train_results.json ADDED Viewed

	@@ -0,0 +1,9 @@

+{
+    "epoch": 1.9986120749479528,
+    "total_flos": 0.0,
+    "train_loss": 0.15370953861210082,
+    "train_runtime": 41916.6173,
+    "train_samples": 184443,
+    "train_samples_per_second": 8.8,
+    "train_steps_per_second": 0.034
+}