FatCat87 commited on 22 days ago

Commit

bc1a1c2

verified ·

1 Parent(s): e45d937

Upload folder using huggingface_hub

Browse files

Files changed (40) hide show

checkpoint-21/README.md +202 -0
checkpoint-21/adapter_config.json +34 -0
checkpoint-21/adapter_model.safetensors +3 -0
checkpoint-21/optimizer.pt +3 -0
checkpoint-21/rng_state_0.pth +3 -0
checkpoint-21/rng_state_1.pth +3 -0
checkpoint-21/rng_state_2.pth +3 -0
checkpoint-21/rng_state_3.pth +3 -0
checkpoint-21/scheduler.pt +3 -0
checkpoint-21/special_tokens_map.json +35 -0
checkpoint-21/tokenizer.json +0 -0
checkpoint-21/tokenizer.model +3 -0
checkpoint-21/tokenizer_config.json +47 -0
checkpoint-21/trainer_state.json +212 -0
checkpoint-21/training_args.bin +3 -0
checkpoint-42/README.md +202 -0
checkpoint-42/adapter_config.json +34 -0
checkpoint-42/adapter_model.safetensors +3 -0
checkpoint-42/optimizer.pt +3 -0
checkpoint-42/rng_state_0.pth +3 -0
checkpoint-42/rng_state_1.pth +3 -0
checkpoint-42/rng_state_2.pth +3 -0
checkpoint-42/rng_state_3.pth +3 -0
checkpoint-42/scheduler.pt +3 -0
checkpoint-42/special_tokens_map.json +35 -0
checkpoint-42/tokenizer.json +0 -0
checkpoint-42/tokenizer.model +3 -0
checkpoint-42/tokenizer_config.json +47 -0
checkpoint-42/trainer_state.json +391 -0
checkpoint-42/training_args.bin +3 -0
merged/config.json +35 -0
merged/generation_config.json +7 -0
merged/pytorch_model-00001-of-00003.bin +3 -0
merged/pytorch_model-00002-of-00003.bin +3 -0
merged/pytorch_model-00003-of-00003.bin +3 -0
merged/pytorch_model.bin.index.json +298 -0
merged/special_tokens_map.json +35 -0
merged/tokenizer.json +0 -0
merged/tokenizer.model +3 -0
merged/tokenizer_config.json +47 -0

checkpoint-21/README.md ADDED Viewed

	@@ -0,0 +1,202 @@

+---
+library_name: peft
+base_model: NousResearch/Yarn-Mistral-7b-64k
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.11.1

checkpoint-21/adapter_config.json ADDED Viewed

	@@ -0,0 +1,34 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "NousResearch/Yarn-Mistral-7b-64k",
+  "bias": "none",
+  "fan_in_fan_out": null,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 16,
+  "lora_dropout": 0.05,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 32,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "gate_proj",
+    "v_proj",
+    "o_proj",
+    "k_proj",
+    "up_proj",
+    "down_proj",
+    "q_proj"
+  ],
+  "task_type": "CAUSAL_LM",
+  "use_dora": false,
+  "use_rslora": false
+}

checkpoint-21/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:375909d254f0b8a268643e1ac1cc35c723225435fd028214721830e52cb1c346
+size 335604696

checkpoint-21/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3bcd6e55b599f91c6ae805238a430241a113db300ab3fc03740ed20b2775a253
+size 168624724

checkpoint-21/rng_state_0.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c2684bc15ac0b3eadebd60c1a740cc8abb940a581e1c8c7e15b25067d802ec7e
+size 14960

checkpoint-21/rng_state_1.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a6073382ef14beec365971bd56563156f8229e270e50b2e1c22a67a0fd771c4a
+size 14960

checkpoint-21/rng_state_2.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:573e6fcf7b7cf83fdd9ada858d3852156f62fcf32aa7eac75f225c9288908f7f
+size 14960

checkpoint-21/rng_state_3.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ee4456f86560bfa75625fc170e84c81c895634ad6f3f0d4df7124e3869c178c8
+size 14960

checkpoint-21/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8057116cebca6e914194f0aea4f38ed11bfeac37985745f93da3fcdb776aa9d0
+size 1064

checkpoint-21/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,35 @@

+{
+  "additional_special_tokens": [
+    "<unk>",
+    "<s>",
+    "</s>"
+  ],
+  "bos_token": {
+    "content": "<s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "</s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "</s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "<unk>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

checkpoint-21/tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

checkpoint-21/tokenizer.model ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:dadfd56d766715c61d2ef780a525ab43b8e6da4de6865bda3d95fdef5e134055
+size 493443

checkpoint-21/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,47 @@

+{
+  "add_bos_token": true,
+  "add_eos_token": false,
+  "add_prefix_space": null,
+  "added_tokens_decoder": {
+    "0": {
+      "content": "<unk>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "1": {
+      "content": "<s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "2": {
+      "content": "</s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "additional_special_tokens": [
+    "<unk>",
+    "<s>",
+    "</s>"
+  ],
+  "bos_token": "<s>",
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "</s>",
+  "legacy": true,
+  "model_max_length": 1000000000000000019884624838656,
+  "pad_token": "</s>",
+  "sp_model_kwargs": {},
+  "spaces_between_special_tokens": false,
+  "tokenizer_class": "LlamaTokenizer",
+  "unk_token": "<unk>",
+  "use_default_system_prompt": true
+}

checkpoint-21/trainer_state.json ADDED Viewed

	@@ -0,0 +1,212 @@

+{
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 0.9655172413793104,
+  "eval_steps": 6,
+  "global_step": 21,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.04597701149425287,
+      "grad_norm": 2.537923574447632,
+      "learning_rate": 0.0001,
+      "loss": 1.4151,
+      "step": 1
+    },
+    {
+      "epoch": 0.04597701149425287,
+      "eval_loss": 0.6549283862113953,
+      "eval_runtime": 59.8262,
+      "eval_samples_per_second": 3.31,
+      "eval_steps_per_second": 0.418,
+      "step": 1
+    },
+    {
+      "epoch": 0.09195402298850575,
+      "grad_norm": 2.5883562564849854,
+      "learning_rate": 0.0002,
+      "loss": 1.3772,
+      "step": 2
+    },
+    {
+      "epoch": 0.13793103448275862,
+      "grad_norm": 2.2641923427581787,
+      "learning_rate": 0.0001996917333733128,
+      "loss": 1.3153,
+      "step": 3
+    },
+    {
+      "epoch": 0.1839080459770115,
+      "grad_norm": 5.304536819458008,
+      "learning_rate": 0.00019876883405951377,
+      "loss": 1.1035,
+      "step": 4
+    },
+    {
+      "epoch": 0.22988505747126436,
+      "grad_norm": 1.074016809463501,
+      "learning_rate": 0.00019723699203976766,
+      "loss": 0.9059,
+      "step": 5
+    },
+    {
+      "epoch": 0.27586206896551724,
+      "grad_norm": 0.8219733834266663,
+      "learning_rate": 0.00019510565162951537,
+      "loss": 0.8678,
+      "step": 6
+    },
+    {
+      "epoch": 0.27586206896551724,
+      "eval_loss": 0.6351205706596375,
+      "eval_runtime": 59.9835,
+      "eval_samples_per_second": 3.301,
+      "eval_steps_per_second": 0.417,
+      "step": 6
+    },
+    {
+      "epoch": 0.3218390804597701,
+      "grad_norm": 1.478880524635315,
+      "learning_rate": 0.0001923879532511287,
+      "loss": 0.8782,
+      "step": 7
+    },
+    {
+      "epoch": 0.367816091954023,
+      "grad_norm": 0.7757106423377991,
+      "learning_rate": 0.0001891006524188368,
+      "loss": 0.7993,
+      "step": 8
+    },
+    {
+      "epoch": 0.41379310344827586,
+      "grad_norm": 0.5512004494667053,
+      "learning_rate": 0.00018526401643540922,
+      "loss": 0.7126,
+      "step": 9
+    },
+    {
+      "epoch": 0.45977011494252873,
+      "grad_norm": 1.5733872652053833,
+      "learning_rate": 0.00018090169943749476,
+      "loss": 0.685,
+      "step": 10
+    },
+    {
+      "epoch": 0.5057471264367817,
+      "grad_norm": 0.34443506598472595,
+      "learning_rate": 0.0001760405965600031,
+      "loss": 0.653,
+      "step": 11
+    },
+    {
+      "epoch": 0.5517241379310345,
+      "grad_norm": 0.5645097494125366,
+      "learning_rate": 0.00017071067811865476,
+      "loss": 0.6408,
+      "step": 12
+    },
+    {
+      "epoch": 0.5517241379310345,
+      "eval_loss": 0.6028693318367004,
+      "eval_runtime": 59.4134,
+      "eval_samples_per_second": 3.333,
+      "eval_steps_per_second": 0.421,
+      "step": 12
+    },
+    {
+      "epoch": 0.5977011494252874,
+      "grad_norm": 0.5011037588119507,
+      "learning_rate": 0.00016494480483301836,
+      "loss": 0.6326,
+      "step": 13
+    },
+    {
+      "epoch": 0.6436781609195402,
+      "grad_norm": 0.2876424789428711,
+      "learning_rate": 0.00015877852522924732,
+      "loss": 0.6937,
+      "step": 14
+    },
+    {
+      "epoch": 0.6896551724137931,
+      "grad_norm": 0.47212648391723633,
+      "learning_rate": 0.0001522498564715949,
+      "loss": 0.6298,
+      "step": 15
+    },
+    {
+      "epoch": 0.735632183908046,
+      "grad_norm": 0.6836615800857544,
+      "learning_rate": 0.00014539904997395468,
+      "loss": 0.5569,
+      "step": 16
+    },
+    {
+      "epoch": 0.7816091954022989,
+      "grad_norm": 0.26638293266296387,
+      "learning_rate": 0.000138268343236509,
+      "loss": 0.6057,
+      "step": 17
+    },
+    {
+      "epoch": 0.8275862068965517,
+      "grad_norm": 0.5527795553207397,
+      "learning_rate": 0.00013090169943749476,
+      "loss": 0.6546,
+      "step": 18
+    },
+    {
+      "epoch": 0.8275862068965517,
+      "eval_loss": 0.5754240155220032,
+      "eval_runtime": 59.3448,
+      "eval_samples_per_second": 3.336,
+      "eval_steps_per_second": 0.421,
+      "step": 18
+    },
+    {
+      "epoch": 0.8735632183908046,
+      "grad_norm": 0.3253306746482849,
+      "learning_rate": 0.00012334453638559057,
+      "loss": 0.5521,
+      "step": 19
+    },
+    {
+      "epoch": 0.9195402298850575,
+      "grad_norm": 0.17234736680984497,
+      "learning_rate": 0.0001156434465040231,
+      "loss": 0.5802,
+      "step": 20
+    },
+    {
+      "epoch": 0.9655172413793104,
+      "grad_norm": 0.2543017864227295,
+      "learning_rate": 0.0001078459095727845,
+      "loss": 0.5579,
+      "step": 21
+    }
+  ],
+  "logging_steps": 1,
+  "max_steps": 42,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 2,
+  "save_steps": 21,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 1.1881844810396467e+17,
+  "train_batch_size": 2,
+  "trial_name": null,
+  "trial_params": null
+}

checkpoint-21/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c7597b9398cd5a80ae99530d1801ba0bdb201f15ee1f1b3ae6442df8bd1d62d0
+size 6200

checkpoint-42/README.md ADDED Viewed

	@@ -0,0 +1,202 @@

+---
+library_name: peft
+base_model: NousResearch/Yarn-Mistral-7b-64k
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.11.1

checkpoint-42/adapter_config.json ADDED Viewed

	@@ -0,0 +1,34 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "NousResearch/Yarn-Mistral-7b-64k",
+  "bias": "none",
+  "fan_in_fan_out": null,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 16,
+  "lora_dropout": 0.05,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 32,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "gate_proj",
+    "v_proj",
+    "o_proj",
+    "k_proj",
+    "up_proj",
+    "down_proj",
+    "q_proj"
+  ],
+  "task_type": "CAUSAL_LM",
+  "use_dora": false,
+  "use_rslora": false
+}

checkpoint-42/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:18e47b5d8ddd70d39b37d280c3af27514e07cf1a7f72078b7d37bfda8479c74d
+size 335604696

checkpoint-42/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7c125dd1aafa5e7583d4c20081726039e1cb69f66c81bfa920f2136f15b63ff5
+size 168624724

checkpoint-42/rng_state_0.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:254239861923df0d7f8a0b2c4b0af95b72ea41a4cefe224f8734022aa686bbc3
+size 14960

checkpoint-42/rng_state_1.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:cabb2a184bccf68c43275a0ee3c601cc65cce9c9fb494736c57136f56e14b39b
+size 14960

checkpoint-42/rng_state_2.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3cd024cba4fb79b42561cdebaad820564bc22220e5fe20fb905f21934324add4
+size 14960

checkpoint-42/rng_state_3.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:730583047ac80431a59a707690c52653caca27c3936faa3a515bd38baf1b0908
+size 14960

checkpoint-42/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:da2e7edae03699c1cd9559730363e2f67d4336103b5fded6865a5437f817ac2d
+size 1064

checkpoint-42/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,35 @@

+{
+  "additional_special_tokens": [
+    "<unk>",
+    "<s>",
+    "</s>"
+  ],
+  "bos_token": {
+    "content": "<s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "</s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "</s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "<unk>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

checkpoint-42/tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

checkpoint-42/tokenizer.model ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:dadfd56d766715c61d2ef780a525ab43b8e6da4de6865bda3d95fdef5e134055
+size 493443

checkpoint-42/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,47 @@

+{
+  "add_bos_token": true,
+  "add_eos_token": false,
+  "add_prefix_space": null,
+  "added_tokens_decoder": {
+    "0": {
+      "content": "<unk>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "1": {
+      "content": "<s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "2": {
+      "content": "</s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "additional_special_tokens": [
+    "<unk>",
+    "<s>",
+    "</s>"
+  ],
+  "bos_token": "<s>",
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "</s>",
+  "legacy": true,
+  "model_max_length": 1000000000000000019884624838656,
+  "pad_token": "</s>",
+  "sp_model_kwargs": {},
+  "spaces_between_special_tokens": false,
+  "tokenizer_class": "LlamaTokenizer",
+  "unk_token": "<unk>",
+  "use_default_system_prompt": true
+}

checkpoint-42/trainer_state.json ADDED Viewed

	@@ -0,0 +1,391 @@

+{
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 1.8850574712643677,
+  "eval_steps": 6,
+  "global_step": 42,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.04597701149425287,
+      "grad_norm": 2.537923574447632,
+      "learning_rate": 0.0001,
+      "loss": 1.4151,
+      "step": 1
+    },
+    {
+      "epoch": 0.04597701149425287,
+      "eval_loss": 0.6549283862113953,
+      "eval_runtime": 59.8262,
+      "eval_samples_per_second": 3.31,
+      "eval_steps_per_second": 0.418,
+      "step": 1
+    },
+    {
+      "epoch": 0.09195402298850575,
+      "grad_norm": 2.5883562564849854,
+      "learning_rate": 0.0002,
+      "loss": 1.3772,
+      "step": 2
+    },
+    {
+      "epoch": 0.13793103448275862,
+      "grad_norm": 2.2641923427581787,
+      "learning_rate": 0.0001996917333733128,
+      "loss": 1.3153,
+      "step": 3
+    },
+    {
+      "epoch": 0.1839080459770115,
+      "grad_norm": 5.304536819458008,
+      "learning_rate": 0.00019876883405951377,
+      "loss": 1.1035,
+      "step": 4
+    },
+    {
+      "epoch": 0.22988505747126436,
+      "grad_norm": 1.074016809463501,
+      "learning_rate": 0.00019723699203976766,
+      "loss": 0.9059,
+      "step": 5
+    },
+    {
+      "epoch": 0.27586206896551724,
+      "grad_norm": 0.8219733834266663,
+      "learning_rate": 0.00019510565162951537,
+      "loss": 0.8678,
+      "step": 6
+    },
+    {
+      "epoch": 0.27586206896551724,
+      "eval_loss": 0.6351205706596375,
+      "eval_runtime": 59.9835,
+      "eval_samples_per_second": 3.301,
+      "eval_steps_per_second": 0.417,
+      "step": 6
+    },
+    {
+      "epoch": 0.3218390804597701,
+      "grad_norm": 1.478880524635315,
+      "learning_rate": 0.0001923879532511287,
+      "loss": 0.8782,
+      "step": 7
+    },
+    {
+      "epoch": 0.367816091954023,
+      "grad_norm": 0.7757106423377991,
+      "learning_rate": 0.0001891006524188368,
+      "loss": 0.7993,
+      "step": 8
+    },
+    {
+      "epoch": 0.41379310344827586,
+      "grad_norm": 0.5512004494667053,
+      "learning_rate": 0.00018526401643540922,
+      "loss": 0.7126,
+      "step": 9
+    },
+    {
+      "epoch": 0.45977011494252873,
+      "grad_norm": 1.5733872652053833,
+      "learning_rate": 0.00018090169943749476,
+      "loss": 0.685,
+      "step": 10
+    },
+    {
+      "epoch": 0.5057471264367817,
+      "grad_norm": 0.34443506598472595,
+      "learning_rate": 0.0001760405965600031,
+      "loss": 0.653,
+      "step": 11
+    },
+    {
+      "epoch": 0.5517241379310345,
+      "grad_norm": 0.5645097494125366,
+      "learning_rate": 0.00017071067811865476,
+      "loss": 0.6408,
+      "step": 12
+    },
+    {
+      "epoch": 0.5517241379310345,
+      "eval_loss": 0.6028693318367004,
+      "eval_runtime": 59.4134,
+      "eval_samples_per_second": 3.333,
+      "eval_steps_per_second": 0.421,
+      "step": 12
+    },
+    {
+      "epoch": 0.5977011494252874,
+      "grad_norm": 0.5011037588119507,
+      "learning_rate": 0.00016494480483301836,
+      "loss": 0.6326,
+      "step": 13
+    },
+    {
+      "epoch": 0.6436781609195402,
+      "grad_norm": 0.2876424789428711,
+      "learning_rate": 0.00015877852522924732,
+      "loss": 0.6937,
+      "step": 14
+    },
+    {
+      "epoch": 0.6896551724137931,
+      "grad_norm": 0.47212648391723633,
+      "learning_rate": 0.0001522498564715949,
+      "loss": 0.6298,
+      "step": 15
+    },
+    {
+      "epoch": 0.735632183908046,
+      "grad_norm": 0.6836615800857544,
+      "learning_rate": 0.00014539904997395468,
+      "loss": 0.5569,
+      "step": 16
+    },
+    {
+      "epoch": 0.7816091954022989,
+      "grad_norm": 0.26638293266296387,
+      "learning_rate": 0.000138268343236509,
+      "loss": 0.6057,
+      "step": 17
+    },
+    {
+      "epoch": 0.8275862068965517,
+      "grad_norm": 0.5527795553207397,
+      "learning_rate": 0.00013090169943749476,
+      "loss": 0.6546,
+      "step": 18
+    },
+    {
+      "epoch": 0.8275862068965517,
+      "eval_loss": 0.5754240155220032,
+      "eval_runtime": 59.3448,
+      "eval_samples_per_second": 3.336,
+      "eval_steps_per_second": 0.421,
+      "step": 18
+    },
+    {
+      "epoch": 0.8735632183908046,
+      "grad_norm": 0.3253306746482849,
+      "learning_rate": 0.00012334453638559057,
+      "loss": 0.5521,
+      "step": 19
+    },
+    {
+      "epoch": 0.9195402298850575,
+      "grad_norm": 0.17234736680984497,
+      "learning_rate": 0.0001156434465040231,
+      "loss": 0.5802,
+      "step": 20
+    },
+    {
+      "epoch": 0.9655172413793104,
+      "grad_norm": 0.2543017864227295,
+      "learning_rate": 0.0001078459095727845,
+      "loss": 0.5579,
+      "step": 21
+    },
+    {
+      "epoch": 1.0114942528735633,
+      "grad_norm": 0.8507859706878662,
+      "learning_rate": 0.0001,
+      "loss": 0.5795,
+      "step": 22
+    },
+    {
+      "epoch": 1.0114942528735633,
+      "grad_norm": 0.29268795251846313,
+      "learning_rate": 9.215409042721552e-05,
+      "loss": 0.6311,
+      "step": 23
+    },
+    {
+      "epoch": 1.0574712643678161,
+      "grad_norm": 0.18724839389324188,
+      "learning_rate": 8.435655349597689e-05,
+      "loss": 0.574,
+      "step": 24
+    },
+    {
+      "epoch": 1.0574712643678161,
+      "eval_loss": 0.5620841979980469,
+      "eval_runtime": 59.4173,
+      "eval_samples_per_second": 3.332,
+      "eval_steps_per_second": 0.421,
+      "step": 24
+    },
+    {
+      "epoch": 1.103448275862069,
+      "grad_norm": 0.21775011718273163,
+      "learning_rate": 7.66554636144095e-05,
+      "loss": 0.5818,
+      "step": 25
+    },
+    {
+      "epoch": 1.1494252873563218,
+      "grad_norm": 0.24518676102161407,
+      "learning_rate": 6.909830056250527e-05,
+      "loss": 0.5618,
+      "step": 26
+    },
+    {
+      "epoch": 1.1954022988505748,
+      "grad_norm": 0.24137045443058014,
+      "learning_rate": 6.173165676349103e-05,
+      "loss": 0.6078,
+      "step": 27
+    },
+    {
+      "epoch": 1.2413793103448276,
+      "grad_norm": 0.17265664041042328,
+      "learning_rate": 5.4600950026045326e-05,
+      "loss": 0.5907,
+      "step": 28
+    },
+    {
+      "epoch": 1.2873563218390804,
+      "grad_norm": 0.1558091938495636,
+      "learning_rate": 4.7750143528405126e-05,
+      "loss": 0.553,
+      "step": 29
+    },
+    {
+      "epoch": 1.3333333333333333,
+      "grad_norm": 0.15223024785518646,
+      "learning_rate": 4.12214747707527e-05,
+      "loss": 0.5518,
+      "step": 30
+    },
+    {
+      "epoch": 1.3333333333333333,
+      "eval_loss": 0.5595065951347351,
+      "eval_runtime": 59.4325,
+      "eval_samples_per_second": 3.332,
+      "eval_steps_per_second": 0.421,
+      "step": 30
+    },
+    {
+      "epoch": 1.3793103448275863,
+      "grad_norm": 0.25727611780166626,
+      "learning_rate": 3.5055195166981645e-05,
+      "loss": 0.5463,
+      "step": 31
+    },
+    {
+      "epoch": 1.4252873563218391,
+      "grad_norm": 0.13202741742134094,
+      "learning_rate": 2.9289321881345254e-05,
+      "loss": 0.5598,
+      "step": 32
+    },
+    {
+      "epoch": 1.471264367816092,
+      "grad_norm": 0.20626722276210785,
+      "learning_rate": 2.3959403439996907e-05,
+      "loss": 0.5242,
+      "step": 33
+    },
+    {
+      "epoch": 1.5172413793103448,
+      "grad_norm": 0.16128551959991455,
+      "learning_rate": 1.9098300562505266e-05,
+      "loss": 0.5715,
+      "step": 34
+    },
+    {
+      "epoch": 1.5632183908045976,
+      "grad_norm": 0.346206396818161,
+      "learning_rate": 1.4735983564590783e-05,
+      "loss": 0.5265,
+      "step": 35
+    },
+    {
+      "epoch": 1.6091954022988506,
+      "grad_norm": 0.1324978768825531,
+      "learning_rate": 1.0899347581163221e-05,
+      "loss": 0.5446,
+      "step": 36
+    },
+    {
+      "epoch": 1.6091954022988506,
+      "eval_loss": 0.5576122403144836,
+      "eval_runtime": 59.5124,
+      "eval_samples_per_second": 3.327,
+      "eval_steps_per_second": 0.42,
+      "step": 36
+    },
+    {
+      "epoch": 1.6551724137931034,
+      "grad_norm": 0.1254015415906906,
+      "learning_rate": 7.612046748871327e-06,
+      "loss": 0.5739,
+      "step": 37
+    },
+    {
+      "epoch": 1.7011494252873565,
+      "grad_norm": 0.18170268833637238,
+      "learning_rate": 4.8943483704846475e-06,
+      "loss": 0.5717,
+      "step": 38
+    },
+    {
+      "epoch": 1.7471264367816093,
+      "grad_norm": 0.12662401795387268,
+      "learning_rate": 2.7630079602323442e-06,
+      "loss": 0.5424,
+      "step": 39
+    },
+    {
+      "epoch": 1.793103448275862,
+      "grad_norm": 0.1479385495185852,
+      "learning_rate": 1.231165940486234e-06,
+      "loss": 0.563,
+      "step": 40
+    },
+    {
+      "epoch": 1.839080459770115,
+      "grad_norm": 0.13152529299259186,
+      "learning_rate": 3.0826662668720364e-07,
+      "loss": 0.5698,
+      "step": 41
+    },
+    {
+      "epoch": 1.8850574712643677,
+      "grad_norm": 0.17284046113491058,
+      "learning_rate": 0.0,
+      "loss": 0.6227,
+      "step": 42
+    },
+    {
+      "epoch": 1.8850574712643677,
+      "eval_loss": 0.5568149089813232,
+      "eval_runtime": 59.8384,
+      "eval_samples_per_second": 3.309,
+      "eval_steps_per_second": 0.418,
+      "step": 42
+    }
+  ],
+  "logging_steps": 1,
+  "max_steps": 42,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 2,
+  "save_steps": 21,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": true
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 2.3746008314177126e+17,
+  "train_batch_size": 2,
+  "trial_name": null,
+  "trial_params": null
+}

checkpoint-42/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c7597b9398cd5a80ae99530d1801ba0bdb201f15ee1f1b3ae6442df8bd1d62d0
+size 6200

merged/config.json ADDED Viewed

	@@ -0,0 +1,35 @@

+{
+  "_name_or_path": "NousResearch/Yarn-Mistral-7b-64k",
+  "architectures": [
+    "MistralForCausalLM"
+  ],
+  "auto_map": {
+    "AutoConfig": "NousResearch/Yarn-Mistral-7b-64k--configuration_mistral.MistralConfig",
+    "AutoModelForCausalLM": "NousResearch/Yarn-Mistral-7b-64k--modeling_mistral_yarn.MistralForCausalLM"
+  },
+  "bos_token_id": 1,
+  "eos_token_id": 2,
+  "hidden_act": "silu",
+  "hidden_size": 4096,
+  "initializer_range": 0.02,
+  "intermediate_size": 14336,
+  "max_position_embeddings": 32768,
+  "model_type": "mistral",
+  "num_attention_heads": 32,
+  "num_hidden_layers": 32,
+  "num_key_value_heads": 8,
+  "rms_norm_eps": 1e-05,
+  "rope_scaling": {
+    "factor": 8.0,
+    "finetuned": true,
+    "original_max_position_embeddings": 8192,
+    "type": "yarn"
+  },
+  "rope_theta": 10000.0,
+  "sliding_window": 65536,
+  "tie_word_embeddings": false,
+  "torch_dtype": "bfloat16",
+  "transformers_version": "4.42.3",
+  "use_cache": false,
+  "vocab_size": 32000
+}

merged/generation_config.json ADDED Viewed

	@@ -0,0 +1,7 @@

+{
+  "_from_model_config": true,
+  "bos_token_id": 1,
+  "do_sample": true,
+  "eos_token_id": 2,
+  "transformers_version": "4.42.3"
+}

merged/pytorch_model-00001-of-00003.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:67dd4fc16babff489884f4f1c1bb3132c12ccd10d1aaa82e6e07e5e25d2c4ae6
+size 4943184288

merged/pytorch_model-00002-of-00003.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:fe27a44b38acb4f7ae0c94b145864da4aff40d7f53c8df28da1199d0498dcc74
+size 4999843272

merged/pytorch_model-00003-of-00003.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f4abda7ec763df491e5b4ca6c7d2c9af4f36b35724bb07e77592f074f0268cf8
+size 4540536134

merged/pytorch_model.bin.index.json ADDED Viewed

	@@ -0,0 +1,298 @@

+{
+  "metadata": {
+    "total_size": 14483464192
+  },
+  "weight_map": {
+    "lm_head.weight": "pytorch_model-00003-of-00003.bin",
+    "model.embed_tokens.weight": "pytorch_model-00001-of-00003.bin",
+    "model.layers.0.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
+    "model.layers.0.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
+    "model.layers.0.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
+    "model.layers.0.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
+    "model.layers.0.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
+    "model.layers.0.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
+    "model.layers.0.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
+    "model.layers.0.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
+    "model.layers.0.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
+    "model.layers.1.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
+    "model.layers.1.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
+    "model.layers.1.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
+    "model.layers.1.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
+    "model.layers.1.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
+    "model.layers.1.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
+    "model.layers.1.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
+    "model.layers.1.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
+    "model.layers.1.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
+    "model.layers.10.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
+    "model.layers.10.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
+    "model.layers.10.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
+    "model.layers.10.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
+    "model.layers.10.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
+    "model.layers.10.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
+    "model.layers.10.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
+    "model.layers.10.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
+    "model.layers.10.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
+    "model.layers.11.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
+    "model.layers.11.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
+    "model.layers.11.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
+    "model.layers.11.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
+    "model.layers.11.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
+    "model.layers.11.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
+    "model.layers.11.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
+    "model.layers.11.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
+    "model.layers.11.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
+    "model.layers.12.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
+    "model.layers.12.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
+    "model.layers.12.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
+    "model.layers.12.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
+    "model.layers.12.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
+    "model.layers.12.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
+    "model.layers.12.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
+    "model.layers.12.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
+    "model.layers.12.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
+    "model.layers.13.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
+    "model.layers.13.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
+    "model.layers.13.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
+    "model.layers.13.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
+    "model.layers.13.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
+    "model.layers.13.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
+    "model.layers.13.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
+    "model.layers.13.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
+    "model.layers.13.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
+    "model.layers.14.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
+    "model.layers.14.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
+    "model.layers.14.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
+    "model.layers.14.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
+    "model.layers.14.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
+    "model.layers.14.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
+    "model.layers.14.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
+    "model.layers.14.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
+    "model.layers.14.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
+    "model.layers.15.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
+    "model.layers.15.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
+    "model.layers.15.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
+    "model.layers.15.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
+    "model.layers.15.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
+    "model.layers.15.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
+    "model.layers.15.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
+    "model.layers.15.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
+    "model.layers.15.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
+    "model.layers.16.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
+    "model.layers.16.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
+    "model.layers.16.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
+    "model.layers.16.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
+    "model.layers.16.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
+    "model.layers.16.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
+    "model.layers.16.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
+    "model.layers.16.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
+    "model.layers.16.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
+    "model.layers.17.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
+    "model.layers.17.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
+    "model.layers.17.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
+    "model.layers.17.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
+    "model.layers.17.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
+    "model.layers.17.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
+    "model.layers.17.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
+    "model.layers.17.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
+    "model.layers.17.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
+    "model.layers.18.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
+    "model.layers.18.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
+    "model.layers.18.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
+    "model.layers.18.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
+    "model.layers.18.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
+    "model.layers.18.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
+    "model.layers.18.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
+    "model.layers.18.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
+    "model.layers.18.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
+    "model.layers.19.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
+    "model.layers.19.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
+    "model.layers.19.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
+    "model.layers.19.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
+    "model.layers.19.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
+    "model.layers.19.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
+    "model.layers.19.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
+    "model.layers.19.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
+    "model.layers.19.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
+    "model.layers.2.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
+    "model.layers.2.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
+    "model.layers.2.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
+    "model.layers.2.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
+    "model.layers.2.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
+    "model.layers.2.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
+    "model.layers.2.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
+    "model.layers.2.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
+    "model.layers.2.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
+    "model.layers.20.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
+    "model.layers.20.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
+    "model.layers.20.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
+    "model.layers.20.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
+    "model.layers.20.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
+    "model.layers.20.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
+    "model.layers.20.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
+    "model.layers.20.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
+    "model.layers.20.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
+    "model.layers.21.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
+    "model.layers.21.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
+    "model.layers.21.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
+    "model.layers.21.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
+    "model.layers.21.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
+    "model.layers.21.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
+    "model.layers.21.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
+    "model.layers.21.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
+    "model.layers.21.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
+    "model.layers.22.input_layernorm.weight": "pytorch_model-00003-of-00003.bin",
+    "model.layers.22.mlp.down_proj.weight": "pytorch_model-00003-of-00003.bin",
+    "model.layers.22.mlp.gate_proj.weight": "pytorch_model-00003-of-00003.bin",
+    "model.layers.22.mlp.up_proj.weight": "pytorch_model-00003-of-00003.bin",
+    "model.layers.22.post_attention_layernorm.weight": "pytorch_model-00003-of-00003.bin",
+    "model.layers.22.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
+    "model.layers.22.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
+    "model.layers.22.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
+    "model.layers.22.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
+    "model.layers.23.input_layernorm.weight": "pytorch_model-00003-of-00003.bin",
+    "model.layers.23.mlp.down_proj.weight": "pytorch_model-00003-of-00003.bin",
+    "model.layers.23.mlp.gate_proj.weight": "pytorch_model-00003-of-00003.bin",
+    "model.layers.23.mlp.up_proj.weight": "pytorch_model-00003-of-00003.bin",
+    "model.layers.23.post_attention_layernorm.weight": "pytorch_model-00003-of-00003.bin",
+    "model.layers.23.self_attn.k_proj.weight": "pytorch_model-00003-of-00003.bin",
+    "model.layers.23.self_attn.o_proj.weight": "pytorch_model-00003-of-00003.bin",
+    "model.layers.23.self_attn.q_proj.weight": "pytorch_model-00003-of-00003.bin",
+    "model.layers.23.self_attn.v_proj.weight": "pytorch_model-00003-of-00003.bin",
+    "model.layers.24.input_layernorm.weight": "pytorch_model-00003-of-00003.bin",
+    "model.layers.24.mlp.down_proj.weight": "pytorch_model-00003-of-00003.bin",
+    "model.layers.24.mlp.gate_proj.weight": "pytorch_model-00003-of-00003.bin",
+    "model.layers.24.mlp.up_proj.weight": "pytorch_model-00003-of-00003.bin",
+    "model.layers.24.post_attention_layernorm.weight": "pytorch_model-00003-of-00003.bin",
+    "model.layers.24.self_attn.k_proj.weight": "pytorch_model-00003-of-00003.bin",
+    "model.layers.24.self_attn.o_proj.weight": "pytorch_model-00003-of-00003.bin",
+    "model.layers.24.self_attn.q_proj.weight": "pytorch_model-00003-of-00003.bin",
+    "model.layers.24.self_attn.v_proj.weight": "pytorch_model-00003-of-00003.bin",
+    "model.layers.25.input_layernorm.weight": "pytorch_model-00003-of-00003.bin",
+    "model.layers.25.mlp.down_proj.weight": "pytorch_model-00003-of-00003.bin",
+    "model.layers.25.mlp.gate_proj.weight": "pytorch_model-00003-of-00003.bin",
+    "model.layers.25.mlp.up_proj.weight": "pytorch_model-00003-of-00003.bin",
+    "model.layers.25.post_attention_layernorm.weight": "pytorch_model-00003-of-00003.bin",
+    "model.layers.25.self_attn.k_proj.weight": "pytorch_model-00003-of-00003.bin",
+    "model.layers.25.self_attn.o_proj.weight": "pytorch_model-00003-of-00003.bin",
+    "model.layers.25.self_attn.q_proj.weight": "pytorch_model-00003-of-00003.bin",
+    "model.layers.25.self_attn.v_proj.weight": "pytorch_model-00003-of-00003.bin",
+    "model.layers.26.input_layernorm.weight": "pytorch_model-00003-of-00003.bin",
+    "model.layers.26.mlp.down_proj.weight": "pytorch_model-00003-of-00003.bin",
+    "model.layers.26.mlp.gate_proj.weight": "pytorch_model-00003-of-00003.bin",
+    "model.layers.26.mlp.up_proj.weight": "pytorch_model-00003-of-00003.bin",
+    "model.layers.26.post_attention_layernorm.weight": "pytorch_model-00003-of-00003.bin",
+    "model.layers.26.self_attn.k_proj.weight": "pytorch_model-00003-of-00003.bin",
+    "model.layers.26.self_attn.o_proj.weight": "pytorch_model-00003-of-00003.bin",
+    "model.layers.26.self_attn.q_proj.weight": "pytorch_model-00003-of-00003.bin",
+    "model.layers.26.self_attn.v_proj.weight": "pytorch_model-00003-of-00003.bin",
+    "model.layers.27.input_layernorm.weight": "pytorch_model-00003-of-00003.bin",
+    "model.layers.27.mlp.down_proj.weight": "pytorch_model-00003-of-00003.bin",
+    "model.layers.27.mlp.gate_proj.weight": "pytorch_model-00003-of-00003.bin",
+    "model.layers.27.mlp.up_proj.weight": "pytorch_model-00003-of-00003.bin",
+    "model.layers.27.post_attention_layernorm.weight": "pytorch_model-00003-of-00003.bin",
+    "model.layers.27.self_attn.k_proj.weight": "pytorch_model-00003-of-00003.bin",
+    "model.layers.27.self_attn.o_proj.weight": "pytorch_model-00003-of-00003.bin",
+    "model.layers.27.self_attn.q_proj.weight": "pytorch_model-00003-of-00003.bin",
+    "model.layers.27.self_attn.v_proj.weight": "pytorch_model-00003-of-00003.bin",
+    "model.layers.28.input_layernorm.weight": "pytorch_model-00003-of-00003.bin",
+    "model.layers.28.mlp.down_proj.weight": "pytorch_model-00003-of-00003.bin",
+    "model.layers.28.mlp.gate_proj.weight": "pytorch_model-00003-of-00003.bin",
+    "model.layers.28.mlp.up_proj.weight": "pytorch_model-00003-of-00003.bin",
+    "model.layers.28.post_attention_layernorm.weight": "pytorch_model-00003-of-00003.bin",
+    "model.layers.28.self_attn.k_proj.weight": "pytorch_model-00003-of-00003.bin",
+    "model.layers.28.self_attn.o_proj.weight": "pytorch_model-00003-of-00003.bin",
+    "model.layers.28.self_attn.q_proj.weight": "pytorch_model-00003-of-00003.bin",
+    "model.layers.28.self_attn.v_proj.weight": "pytorch_model-00003-of-00003.bin",
+    "model.layers.29.input_layernorm.weight": "pytorch_model-00003-of-00003.bin",
+    "model.layers.29.mlp.down_proj.weight": "pytorch_model-00003-of-00003.bin",
+    "model.layers.29.mlp.gate_proj.weight": "pytorch_model-00003-of-00003.bin",
+    "model.layers.29.mlp.up_proj.weight": "pytorch_model-00003-of-00003.bin",
+    "model.layers.29.post_attention_layernorm.weight": "pytorch_model-00003-of-00003.bin",
+    "model.layers.29.self_attn.k_proj.weight": "pytorch_model-00003-of-00003.bin",
+    "model.layers.29.self_attn.o_proj.weight": "pytorch_model-00003-of-00003.bin",
+    "model.layers.29.self_attn.q_proj.weight": "pytorch_model-00003-of-00003.bin",
+    "model.layers.29.self_attn.v_proj.weight": "pytorch_model-00003-of-00003.bin",
+    "model.layers.3.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
+    "model.layers.3.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
+    "model.layers.3.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
+    "model.layers.3.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
+    "model.layers.3.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
+    "model.layers.3.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
+    "model.layers.3.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
+    "model.layers.3.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
+    "model.layers.3.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
+    "model.layers.30.input_layernorm.weight": "pytorch_model-00003-of-00003.bin",
+    "model.layers.30.mlp.down_proj.weight": "pytorch_model-00003-of-00003.bin",
+    "model.layers.30.mlp.gate_proj.weight": "pytorch_model-00003-of-00003.bin",
+    "model.layers.30.mlp.up_proj.weight": "pytorch_model-00003-of-00003.bin",
+    "model.layers.30.post_attention_layernorm.weight": "pytorch_model-00003-of-00003.bin",
+    "model.layers.30.self_attn.k_proj.weight": "pytorch_model-00003-of-00003.bin",
+    "model.layers.30.self_attn.o_proj.weight": "pytorch_model-00003-of-00003.bin",
+    "model.layers.30.self_attn.q_proj.weight": "pytorch_model-00003-of-00003.bin",
+    "model.layers.30.self_attn.v_proj.weight": "pytorch_model-00003-of-00003.bin",
+    "model.layers.31.input_layernorm.weight": "pytorch_model-00003-of-00003.bin",
+    "model.layers.31.mlp.down_proj.weight": "pytorch_model-00003-of-00003.bin",
+    "model.layers.31.mlp.gate_proj.weight": "pytorch_model-00003-of-00003.bin",
+    "model.layers.31.mlp.up_proj.weight": "pytorch_model-00003-of-00003.bin",
+    "model.layers.31.post_attention_layernorm.weight": "pytorch_model-00003-of-00003.bin",
+    "model.layers.31.self_attn.k_proj.weight": "pytorch_model-00003-of-00003.bin",
+    "model.layers.31.self_attn.o_proj.weight": "pytorch_model-00003-of-00003.bin",
+    "model.layers.31.self_attn.q_proj.weight": "pytorch_model-00003-of-00003.bin",
+    "model.layers.31.self_attn.v_proj.weight": "pytorch_model-00003-of-00003.bin",
+    "model.layers.4.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
+    "model.layers.4.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
+    "model.layers.4.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
+    "model.layers.4.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
+    "model.layers.4.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
+    "model.layers.4.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
+    "model.layers.4.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
+    "model.layers.4.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
+    "model.layers.4.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
+    "model.layers.5.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
+    "model.layers.5.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
+    "model.layers.5.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
+    "model.layers.5.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
+    "model.layers.5.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
+    "model.layers.5.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
+    "model.layers.5.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
+    "model.layers.5.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
+    "model.layers.5.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
+    "model.layers.6.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
+    "model.layers.6.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
+    "model.layers.6.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
+    "model.layers.6.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
+    "model.layers.6.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
+    "model.layers.6.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
+    "model.layers.6.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
+    "model.layers.6.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
+    "model.layers.6.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
+    "model.layers.7.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
+    "model.layers.7.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
+    "model.layers.7.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
+    "model.layers.7.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
+    "model.layers.7.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
+    "model.layers.7.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
+    "model.layers.7.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
+    "model.layers.7.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
+    "model.layers.7.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
+    "model.layers.8.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
+    "model.layers.8.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
+    "model.layers.8.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
+    "model.layers.8.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
+    "model.layers.8.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
+    "model.layers.8.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
+    "model.layers.8.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
+    "model.layers.8.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
+    "model.layers.8.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
+    "model.layers.9.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
+    "model.layers.9.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
+    "model.layers.9.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
+    "model.layers.9.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
+    "model.layers.9.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
+    "model.layers.9.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
+    "model.layers.9.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
+    "model.layers.9.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
+    "model.layers.9.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
+    "model.norm.weight": "pytorch_model-00003-of-00003.bin"
+  }
+}

merged/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,35 @@

+{
+  "additional_special_tokens": [
+    "<unk>",
+    "<s>",
+    "</s>"
+  ],
+  "bos_token": {
+    "content": "<s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "</s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "</s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "<unk>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

merged/tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

merged/tokenizer.model ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:dadfd56d766715c61d2ef780a525ab43b8e6da4de6865bda3d95fdef5e134055
+size 493443

merged/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,47 @@

+{
+  "add_bos_token": true,
+  "add_eos_token": false,
+  "add_prefix_space": null,
+  "added_tokens_decoder": {
+    "0": {
+      "content": "<unk>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "1": {
+      "content": "<s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "2": {
+      "content": "</s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "additional_special_tokens": [
+    "<unk>",
+    "<s>",
+    "</s>"
+  ],
+  "bos_token": "<s>",
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "</s>",
+  "legacy": true,
+  "model_max_length": 1000000000000000019884624838656,
+  "pad_token": "</s>",
+  "sp_model_kwargs": {},
+  "spaces_between_special_tokens": false,
+  "tokenizer_class": "LlamaTokenizer",
+  "unk_token": "<unk>",
+  "use_default_system_prompt": true
+}