Model save

Files changed (11) hide show

README.md ADDED Viewed

+---
+base_model: mistralai/Mistral-7B-v0.1
+datasets:
+- generator
+library_name: peft
+license: apache-2.0
+tags:
+- trl
+- sft
+- generated_from_trainer
+model-index:
+- name: zephyr-7b-sft-qlora
+  results: []
+---
+<!-- This model card has been generated automatically according to the information the Trainer had access to. You
+should probably proofread and complete it, then remove this comment. -->
+# zephyr-7b-sft-qlora
+This model is a fine-tuned version of [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) on the generator dataset.
+## Model description
+More information needed
+## Intended uses & limitations
+More information needed
+## Training and evaluation data
+More information needed
+## Training procedure
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 0.0002
+- train_batch_size: 4
+- eval_batch_size: 8
+- seed: 42
+- distributed_type: multi-GPU
+- num_devices: 4
+- gradient_accumulation_steps: 2
+- total_train_batch_size: 32
+- total_eval_batch_size: 32
+- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
+- lr_scheduler_type: cosine
+- lr_scheduler_warmup_ratio: 0.1
+- num_epochs: 1
+### Training results
+### Framework versions
+- PEFT 0.13.0
+- Transformers 4.45.1
+- Pytorch 2.4.1+cu121
+- Datasets 3.0.1
+- Tokenizers 0.20.0

adapter_config.json CHANGED Viewed

@@ -20,13 +20,13 @@
   "rank_pattern": {},
   "revision": null,
   "target_modules": [
-    "gate_proj",
     "k_proj",
     "q_proj",
     "o_proj",
-    "up_proj",
     "down_proj",
-    "v_proj"
   ],
   "task_type": "CAUSAL_LM",
   "use_dora": false,

   "rank_pattern": {},
   "revision": null,
   "target_modules": [
+    "up_proj",
     "k_proj",
+    "v_proj",
     "q_proj",
     "o_proj",
     "down_proj",
+    "gate_proj"
   ],
   "task_type": "CAUSAL_LM",
   "use_dora": false,

all_results.json ADDED Viewed

+{
+    "epoch": 0.265343793262575,
+    "total_flos": 3.2343958172802744e+18,
+    "train_loss": 0.0,
+    "train_runtime": 0.0325,
+    "train_samples": 207864,
+    "train_samples_per_second": 4262129.534,
+    "train_steps_per_second": 133192.508
+}

runs/Oct06_07-55-18_dilara/events.out.tfevents.1728201862.dilara.627613.0 CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:bd6fbd8337d66078c540bb92e09d6763a66989d97b3800c87f087098515403ff
-size 200754

 version https://git-lfs.github.com/spec/v1
+oid sha256:3a8119cb8dfc119eaee292a79913456dae3505a4fb4e4556b754c3c8eaf6d703
+size 204763

runs/Oct06_21-37-11_dilara/events.out.tfevents.1728250690.dilara.742626.0 ADDED Viewed

+version https://git-lfs.github.com/spec/v1
+oid sha256:13dd8d3f097694569cede21c3cec370b76130b1f691efe9fc70bcb9a2f4daba0
+size 6738

runs/Oct06_21-41-27_dilara/events.out.tfevents.1728250899.dilara.743536.0 ADDED Viewed

+version https://git-lfs.github.com/spec/v1
+oid sha256:12db8b7dd5c712de7f101488dcedb0de0848c495f11d5c678bc4f6b0aac3d487
+size 7160

runs/Oct06_21-44-57_dilara/events.out.tfevents.1728251109.dilara.744344.0 ADDED Viewed

+version https://git-lfs.github.com/spec/v1
+oid sha256:cec09ffd6b22f934e580c9b7fdf9a50d0a83a4c8228c2772439950a9bf4b67ec
+size 6527

runs/Oct06_21-46-11_dilara/events.out.tfevents.1728251187.dilara.744686.0 ADDED Viewed

+version https://git-lfs.github.com/spec/v1
+oid sha256:2302c13c66ee724bbc5e579b1b014f3976d03b1b1e3103cfd4af454b1dd61da7
+size 6881

train_results.json ADDED Viewed

+{
+    "epoch": 0.265343793262575,
+    "total_flos": 3.2343958172802744e+18,
+    "train_loss": 0.0,
+    "train_runtime": 0.0325,
+    "train_samples": 207864,
+    "train_samples_per_second": 4262129.534,
+    "train_steps_per_second": 133192.508
+}

trainer_state.json ADDED Viewed

The diff for this file is too large to render. See raw diff

training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:33dfa6f83c69ca72a5024d29717d7d10b2bf0125fc370d832a0006a17d840676
 size 6264

 version https://git-lfs.github.com/spec/v1
+oid sha256:8eb7ac877db3a33453257087f46ab346c567b377c8ca75b9722492890ddb4102
 size 6264