lesso commited on 15 days ago

Commit

7da7344

verified ·

1 Parent(s): 7edcb57

Training in progress, step 50, checkpoint

Browse files

Files changed (20) hide show

last-checkpoint/README.md +202 -0
last-checkpoint/adapter_config.json +34 -0
last-checkpoint/adapter_model.safetensors +3 -0
last-checkpoint/added_tokens.json +0 -0
last-checkpoint/optimizer.pt +3 -0
last-checkpoint/rng_state_0.pth +3 -0
last-checkpoint/rng_state_1.pth +3 -0
last-checkpoint/rng_state_2.pth +3 -0
last-checkpoint/rng_state_3.pth +3 -0
last-checkpoint/rng_state_4.pth +3 -0
last-checkpoint/rng_state_5.pth +3 -0
last-checkpoint/rng_state_6.pth +3 -0
last-checkpoint/rng_state_7.pth +3 -0
last-checkpoint/scheduler.pt +3 -0
last-checkpoint/special_tokens_map.json +30 -0
last-checkpoint/tokenizer.json +0 -0
last-checkpoint/tokenizer.model +3 -0
last-checkpoint/tokenizer_config.json +0 -0
last-checkpoint/trainer_state.json +408 -0
last-checkpoint/training_args.bin +3 -0

last-checkpoint/README.md ADDED Viewed

	@@ -0,0 +1,202 @@

+---
+base_model: Korabbit/llama-2-ko-7b
+library_name: peft
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.13.2

last-checkpoint/adapter_config.json ADDED Viewed

	@@ -0,0 +1,34 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "Korabbit/llama-2-ko-7b",
+  "bias": "none",
+  "fan_in_fan_out": null,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 128,
+  "lora_dropout": 0.05,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 64,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "v_proj",
+    "down_proj",
+    "up_proj",
+    "k_proj",
+    "o_proj",
+    "q_proj",
+    "gate_proj"
+  ],
+  "task_type": "CAUSAL_LM",
+  "use_dora": false,
+  "use_rslora": false
+}

last-checkpoint/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:74c0a6974b9dfa587c25b8ac5169ea3483427c2b8aaaec8abb0f75ad40333690
+size 639691872

last-checkpoint/added_tokens.json ADDED Viewed

The diff for this file is too large to render. See raw diff

last-checkpoint/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e92c8eeaace6e4d17367dca250b938bbab93874e37b42a59b35dc488e6d949e3
+size 325339796

last-checkpoint/rng_state_0.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d80ac4a4f8f9ee807a12f4bf6c8c66b2f9ed45791e58cce13a48c458a454cb19
+size 15984

last-checkpoint/rng_state_1.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4bea2be86d603be08db611ee0bd6ef9449e69e878dd793a0b5d67c1c2d5fb2a8
+size 15984

last-checkpoint/rng_state_2.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e0690e253620d890307206e870c41ff489702351e31b8958dd6c46cac113ddda
+size 15984

last-checkpoint/rng_state_3.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:dfc64f51dc71ff41ed978d7206ff87be28246ca338761bb6182422fbc767f05b
+size 15984

last-checkpoint/rng_state_4.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5a23179c55c451fe438c6dbfc3bec3938838017c65a786df7cb36f0ca87f7d71
+size 15984

last-checkpoint/rng_state_5.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:10893f847939b5ebdf704328d076b5b3a3cac275ac16a33ed5dda3d8f99f0f1e
+size 15984

last-checkpoint/rng_state_6.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f7a6fc339d670c0a6119725e953d6d1b0c35838ee845bb8f1ae782b3e745d3ec
+size 15984

last-checkpoint/rng_state_7.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8b43c7d4ea6b6758adbcaccc94bbfa39d937be487a3e54af10747fe11358beb0
+size 15984

last-checkpoint/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4b7e7d044a338e6b420016be2e48e692869df520a6f768a0f5ba8de63e9bb378
+size 1064

last-checkpoint/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,30 @@

+{
+  "bos_token": {
+    "content": "<s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "</s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "</s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "<unk>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

last-checkpoint/tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

last-checkpoint/tokenizer.model ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9e556afd44213b6bd1be2b850ebbbd98f5481437a8021afaf58ee7fb1818d347
+size 499723

last-checkpoint/tokenizer_config.json ADDED Viewed

The diff for this file is too large to render. See raw diff

last-checkpoint/trainer_state.json ADDED Viewed

	@@ -0,0 +1,408 @@

+{
+  "best_metric": 0.9505019783973694,
+  "best_model_checkpoint": "miner_id_24/checkpoint-50",
+  "epoch": 0.022459292532285232,
+  "eval_steps": 50,
+  "global_step": 50,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.00044918585064570465,
+      "grad_norm": 1.2916557788848877,
+      "learning_rate": 1.0100000000000002e-05,
+      "loss": 1.7692,
+      "step": 1
+    },
+    {
+      "epoch": 0.00044918585064570465,
+      "eval_loss": 1.9869955778121948,
+      "eval_runtime": 205.6388,
+      "eval_samples_per_second": 145.863,
+      "eval_steps_per_second": 4.561,
+      "step": 1
+    },
+    {
+      "epoch": 0.0008983717012914093,
+      "grad_norm": 1.607176661491394,
+      "learning_rate": 2.0200000000000003e-05,
+      "loss": 1.8371,
+      "step": 2
+    },
+    {
+      "epoch": 0.001347557551937114,
+      "grad_norm": 1.799497365951538,
+      "learning_rate": 3.0299999999999998e-05,
+      "loss": 1.8926,
+      "step": 3
+    },
+    {
+      "epoch": 0.0017967434025828186,
+      "grad_norm": 1.7198957204818726,
+      "learning_rate": 4.0400000000000006e-05,
+      "loss": 1.9009,
+      "step": 4
+    },
+    {
+      "epoch": 0.0022459292532285235,
+      "grad_norm": 1.7114357948303223,
+      "learning_rate": 5.05e-05,
+      "loss": 1.8579,
+      "step": 5
+    },
+    {
+      "epoch": 0.002695115103874228,
+      "grad_norm": 1.4471302032470703,
+      "learning_rate": 6.0599999999999996e-05,
+      "loss": 1.6855,
+      "step": 6
+    },
+    {
+      "epoch": 0.0031443009545199328,
+      "grad_norm": 0.8383511304855347,
+      "learning_rate": 7.07e-05,
+      "loss": 1.4356,
+      "step": 7
+    },
+    {
+      "epoch": 0.003593486805165637,
+      "grad_norm": 0.8857292532920837,
+      "learning_rate": 8.080000000000001e-05,
+      "loss": 1.3539,
+      "step": 8
+    },
+    {
+      "epoch": 0.004042672655811342,
+      "grad_norm": 0.7878719568252563,
+      "learning_rate": 9.09e-05,
+      "loss": 1.2892,
+      "step": 9
+    },
+    {
+      "epoch": 0.004491858506457047,
+      "grad_norm": 0.8838219046592712,
+      "learning_rate": 0.000101,
+      "loss": 1.2533,
+      "step": 10
+    },
+    {
+      "epoch": 0.004941044357102751,
+      "grad_norm": 0.8204696774482727,
+      "learning_rate": 0.00010046842105263158,
+      "loss": 1.2432,
+      "step": 11
+    },
+    {
+      "epoch": 0.005390230207748456,
+      "grad_norm": 0.6225486397743225,
+      "learning_rate": 9.993684210526315e-05,
+      "loss": 1.184,
+      "step": 12
+    },
+    {
+      "epoch": 0.00583941605839416,
+      "grad_norm": 0.45047685503959656,
+      "learning_rate": 9.940526315789473e-05,
+      "loss": 1.1476,
+      "step": 13
+    },
+    {
+      "epoch": 0.0062886019090398655,
+      "grad_norm": 0.43591365218162537,
+      "learning_rate": 9.887368421052632e-05,
+      "loss": 1.1241,
+      "step": 14
+    },
+    {
+      "epoch": 0.00673778775968557,
+      "grad_norm": 0.39855462312698364,
+      "learning_rate": 9.83421052631579e-05,
+      "loss": 1.1202,
+      "step": 15
+    },
+    {
+      "epoch": 0.007186973610331274,
+      "grad_norm": 0.32935649156570435,
+      "learning_rate": 9.781052631578948e-05,
+      "loss": 1.0963,
+      "step": 16
+    },
+    {
+      "epoch": 0.007636159460976979,
+      "grad_norm": 0.3512268364429474,
+      "learning_rate": 9.727894736842106e-05,
+      "loss": 1.0864,
+      "step": 17
+    },
+    {
+      "epoch": 0.008085345311622683,
+      "grad_norm": 0.36069127917289734,
+      "learning_rate": 9.674736842105263e-05,
+      "loss": 1.0866,
+      "step": 18
+    },
+    {
+      "epoch": 0.00853453116226839,
+      "grad_norm": 0.3602248728275299,
+      "learning_rate": 9.621578947368421e-05,
+      "loss": 1.0199,
+      "step": 19
+    },
+    {
+      "epoch": 0.008983717012914094,
+      "grad_norm": 0.3629809021949768,
+      "learning_rate": 9.568421052631578e-05,
+      "loss": 1.0797,
+      "step": 20
+    },
+    {
+      "epoch": 0.009432902863559798,
+      "grad_norm": 0.3103269934654236,
+      "learning_rate": 9.515263157894737e-05,
+      "loss": 1.0546,
+      "step": 21
+    },
+    {
+      "epoch": 0.009882088714205503,
+      "grad_norm": 0.30035093426704407,
+      "learning_rate": 9.462105263157895e-05,
+      "loss": 1.0643,
+      "step": 22
+    },
+    {
+      "epoch": 0.010331274564851207,
+      "grad_norm": 0.3492095172405243,
+      "learning_rate": 9.408947368421054e-05,
+      "loss": 1.0258,
+      "step": 23
+    },
+    {
+      "epoch": 0.010780460415496912,
+      "grad_norm": 0.4093751609325409,
+      "learning_rate": 9.355789473684211e-05,
+      "loss": 1.0746,
+      "step": 24
+    },
+    {
+      "epoch": 0.011229646266142616,
+      "grad_norm": 0.5422231554985046,
+      "learning_rate": 9.302631578947369e-05,
+      "loss": 0.975,
+      "step": 25
+    },
+    {
+      "epoch": 0.01167883211678832,
+      "grad_norm": 0.344453364610672,
+      "learning_rate": 9.249473684210526e-05,
+      "loss": 1.0594,
+      "step": 26
+    },
+    {
+      "epoch": 0.012128017967434027,
+      "grad_norm": 0.3349956274032593,
+      "learning_rate": 9.196315789473685e-05,
+      "loss": 1.0283,
+      "step": 27
+    },
+    {
+      "epoch": 0.012577203818079731,
+      "grad_norm": 0.30147019028663635,
+      "learning_rate": 9.143157894736843e-05,
+      "loss": 1.0116,
+      "step": 28
+    },
+    {
+      "epoch": 0.013026389668725435,
+      "grad_norm": 0.25005072355270386,
+      "learning_rate": 9.09e-05,
+      "loss": 0.9857,
+      "step": 29
+    },
+    {
+      "epoch": 0.01347557551937114,
+      "grad_norm": 0.262613981962204,
+      "learning_rate": 9.036842105263158e-05,
+      "loss": 1.0245,
+      "step": 30
+    },
+    {
+      "epoch": 0.013924761370016844,
+      "grad_norm": 0.35343340039253235,
+      "learning_rate": 8.983684210526316e-05,
+      "loss": 0.967,
+      "step": 31
+    },
+    {
+      "epoch": 0.014373947220662549,
+      "grad_norm": 0.24998489022254944,
+      "learning_rate": 8.930526315789474e-05,
+      "loss": 1.0017,
+      "step": 32
+    },
+    {
+      "epoch": 0.014823133071308253,
+      "grad_norm": 0.24498069286346436,
+      "learning_rate": 8.877368421052632e-05,
+      "loss": 1.0075,
+      "step": 33
+    },
+    {
+      "epoch": 0.015272318921953958,
+      "grad_norm": 0.21549324691295624,
+      "learning_rate": 8.82421052631579e-05,
+      "loss": 0.985,
+      "step": 34
+    },
+    {
+      "epoch": 0.015721504772599662,
+      "grad_norm": 0.22830170392990112,
+      "learning_rate": 8.771052631578948e-05,
+      "loss": 0.9629,
+      "step": 35
+    },
+    {
+      "epoch": 0.016170690623245366,
+      "grad_norm": 0.2677782475948334,
+      "learning_rate": 8.717894736842105e-05,
+      "loss": 1.014,
+      "step": 36
+    },
+    {
+      "epoch": 0.01661987647389107,
+      "grad_norm": 0.30911344289779663,
+      "learning_rate": 8.664736842105263e-05,
+      "loss": 0.9741,
+      "step": 37
+    },
+    {
+      "epoch": 0.01706906232453678,
+      "grad_norm": 0.26794004440307617,
+      "learning_rate": 8.61157894736842e-05,
+      "loss": 0.972,
+      "step": 38
+    },
+    {
+      "epoch": 0.017518248175182483,
+      "grad_norm": 0.26757925748825073,
+      "learning_rate": 8.55842105263158e-05,
+      "loss": 1.0005,
+      "step": 39
+    },
+    {
+      "epoch": 0.017967434025828188,
+      "grad_norm": 0.23241755366325378,
+      "learning_rate": 8.505263157894737e-05,
+      "loss": 0.9912,
+      "step": 40
+    },
+    {
+      "epoch": 0.018416619876473892,
+      "grad_norm": 0.23415440320968628,
+      "learning_rate": 8.452105263157896e-05,
+      "loss": 0.96,
+      "step": 41
+    },
+    {
+      "epoch": 0.018865805727119597,
+      "grad_norm": 0.2970597445964813,
+      "learning_rate": 8.398947368421053e-05,
+      "loss": 1.0017,
+      "step": 42
+    },
+    {
+      "epoch": 0.0193149915777653,
+      "grad_norm": 0.3233031630516052,
+      "learning_rate": 8.345789473684211e-05,
+      "loss": 0.9603,
+      "step": 43
+    },
+    {
+      "epoch": 0.019764177428411005,
+      "grad_norm": 0.25598403811454773,
+      "learning_rate": 8.292631578947368e-05,
+      "loss": 0.9232,
+      "step": 44
+    },
+    {
+      "epoch": 0.02021336327905671,
+      "grad_norm": 0.28324705362319946,
+      "learning_rate": 8.239473684210526e-05,
+      "loss": 0.9762,
+      "step": 45
+    },
+    {
+      "epoch": 0.020662549129702414,
+      "grad_norm": 0.2578172981739044,
+      "learning_rate": 8.186315789473683e-05,
+      "loss": 0.9924,
+      "step": 46
+    },
+    {
+      "epoch": 0.02111173498034812,
+      "grad_norm": 0.23937109112739563,
+      "learning_rate": 8.133157894736842e-05,
+      "loss": 0.9744,
+      "step": 47
+    },
+    {
+      "epoch": 0.021560920830993823,
+      "grad_norm": 0.2387576401233673,
+      "learning_rate": 8.080000000000001e-05,
+      "loss": 0.9448,
+      "step": 48
+    },
+    {
+      "epoch": 0.022010106681639528,
+      "grad_norm": 0.3055512309074402,
+      "learning_rate": 8.026842105263159e-05,
+      "loss": 0.9624,
+      "step": 49
+    },
+    {
+      "epoch": 0.022459292532285232,
+      "grad_norm": 0.3693564534187317,
+      "learning_rate": 7.973684210526316e-05,
+      "loss": 0.8829,
+      "step": 50
+    },
+    {
+      "epoch": 0.022459292532285232,
+      "eval_loss": 0.9505019783973694,
+      "eval_runtime": 206.986,
+      "eval_samples_per_second": 144.913,
+      "eval_steps_per_second": 4.532,
+      "step": 50
+    }
+  ],
+  "logging_steps": 1,
+  "max_steps": 200,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 1,
+  "save_steps": 50,
+  "stateful_callbacks": {
+    "EarlyStoppingCallback": {
+      "args": {
+        "early_stopping_patience": 5,
+        "early_stopping_threshold": 0.0
+      },
+      "attributes": {
+        "early_stopping_patience_counter": 0
+      }
+    },
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 5.383070416896e+17,
+  "train_batch_size": 8,
+  "trial_name": null,
+  "trial_params": null
+}

last-checkpoint/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:96810c491ef65742e1753429520f88433563d84af659a240db34b050202c3f64
+size 6840