hf-100 commited on
Commit
f0b767e
1 Parent(s): c3c9fd3

Upload folder using huggingface_hub

Browse files
README.md ADDED
@@ -0,0 +1,202 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: ai21labs/AI21-Jamba-1.5-Mini
3
+ library_name: peft
4
+ ---
5
+
6
+ # Model Card for Model ID
7
+
8
+ <!-- Provide a quick summary of what the model is/does. -->
9
+
10
+
11
+
12
+ ## Model Details
13
+
14
+ ### Model Description
15
+
16
+ <!-- Provide a longer summary of what this model is. -->
17
+
18
+
19
+
20
+ - **Developed by:** [More Information Needed]
21
+ - **Funded by [optional]:** [More Information Needed]
22
+ - **Shared by [optional]:** [More Information Needed]
23
+ - **Model type:** [More Information Needed]
24
+ - **Language(s) (NLP):** [More Information Needed]
25
+ - **License:** [More Information Needed]
26
+ - **Finetuned from model [optional]:** [More Information Needed]
27
+
28
+ ### Model Sources [optional]
29
+
30
+ <!-- Provide the basic links for the model. -->
31
+
32
+ - **Repository:** [More Information Needed]
33
+ - **Paper [optional]:** [More Information Needed]
34
+ - **Demo [optional]:** [More Information Needed]
35
+
36
+ ## Uses
37
+
38
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
+
40
+ ### Direct Use
41
+
42
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
+
44
+ [More Information Needed]
45
+
46
+ ### Downstream Use [optional]
47
+
48
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
49
+
50
+ [More Information Needed]
51
+
52
+ ### Out-of-Scope Use
53
+
54
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
55
+
56
+ [More Information Needed]
57
+
58
+ ## Bias, Risks, and Limitations
59
+
60
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
61
+
62
+ [More Information Needed]
63
+
64
+ ### Recommendations
65
+
66
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
67
+
68
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
69
+
70
+ ## How to Get Started with the Model
71
+
72
+ Use the code below to get started with the model.
73
+
74
+ [More Information Needed]
75
+
76
+ ## Training Details
77
+
78
+ ### Training Data
79
+
80
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
81
+
82
+ [More Information Needed]
83
+
84
+ ### Training Procedure
85
+
86
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
87
+
88
+ #### Preprocessing [optional]
89
+
90
+ [More Information Needed]
91
+
92
+
93
+ #### Training Hyperparameters
94
+
95
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
96
+
97
+ #### Speeds, Sizes, Times [optional]
98
+
99
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
100
+
101
+ [More Information Needed]
102
+
103
+ ## Evaluation
104
+
105
+ <!-- This section describes the evaluation protocols and provides the results. -->
106
+
107
+ ### Testing Data, Factors & Metrics
108
+
109
+ #### Testing Data
110
+
111
+ <!-- This should link to a Dataset Card if possible. -->
112
+
113
+ [More Information Needed]
114
+
115
+ #### Factors
116
+
117
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
118
+
119
+ [More Information Needed]
120
+
121
+ #### Metrics
122
+
123
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
124
+
125
+ [More Information Needed]
126
+
127
+ ### Results
128
+
129
+ [More Information Needed]
130
+
131
+ #### Summary
132
+
133
+
134
+
135
+ ## Model Examination [optional]
136
+
137
+ <!-- Relevant interpretability work for the model goes here -->
138
+
139
+ [More Information Needed]
140
+
141
+ ## Environmental Impact
142
+
143
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
144
+
145
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
+
147
+ - **Hardware Type:** [More Information Needed]
148
+ - **Hours used:** [More Information Needed]
149
+ - **Cloud Provider:** [More Information Needed]
150
+ - **Compute Region:** [More Information Needed]
151
+ - **Carbon Emitted:** [More Information Needed]
152
+
153
+ ## Technical Specifications [optional]
154
+
155
+ ### Model Architecture and Objective
156
+
157
+ [More Information Needed]
158
+
159
+ ### Compute Infrastructure
160
+
161
+ [More Information Needed]
162
+
163
+ #### Hardware
164
+
165
+ [More Information Needed]
166
+
167
+ #### Software
168
+
169
+ [More Information Needed]
170
+
171
+ ## Citation [optional]
172
+
173
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
174
+
175
+ **BibTeX:**
176
+
177
+ [More Information Needed]
178
+
179
+ **APA:**
180
+
181
+ [More Information Needed]
182
+
183
+ ## Glossary [optional]
184
+
185
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
186
+
187
+ [More Information Needed]
188
+
189
+ ## More Information [optional]
190
+
191
+ [More Information Needed]
192
+
193
+ ## Model Card Authors [optional]
194
+
195
+ [More Information Needed]
196
+
197
+ ## Model Card Contact
198
+
199
+ [More Information Needed]
200
+ ### Framework versions
201
+
202
+ - PEFT 0.12.0
adapter_config.json ADDED
@@ -0,0 +1,38 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alpha_pattern": {},
3
+ "auto_mapping": null,
4
+ "base_model_name_or_path": "ai21labs/AI21-Jamba-1.5-Mini",
5
+ "bias": "none",
6
+ "fan_in_fan_out": false,
7
+ "inference_mode": true,
8
+ "init_lora_weights": true,
9
+ "layer_replication": null,
10
+ "layers_pattern": null,
11
+ "layers_to_transform": null,
12
+ "loftq_config": {},
13
+ "lora_alpha": 8,
14
+ "lora_dropout": 0.0,
15
+ "megatron_config": null,
16
+ "megatron_core": "megatron.core",
17
+ "modules_to_save": null,
18
+ "peft_type": "LORA",
19
+ "r": 8,
20
+ "rank_pattern": {},
21
+ "revision": null,
22
+ "target_modules": [
23
+ "o_proj",
24
+ "down_proj",
25
+ "up_proj",
26
+ "out_proj",
27
+ "x_proj",
28
+ "in_proj",
29
+ "v_proj",
30
+ "q_proj",
31
+ "gate_proj",
32
+ "embed_tokens",
33
+ "k_proj"
34
+ ],
35
+ "task_type": "CAUSAL_LM",
36
+ "use_dora": false,
37
+ "use_rslora": false
38
+ }
adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c41692ffa65157911fb7618fa6b6bc6f4b8ef104e257b5fc7c103dad2fab1d2c
3
+ size 1061034800
optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:55a703896316aeb91d9ca5265a2cef996013942fd564bf833211c82059b0c69c
3
+ size 1049332688
rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c741ff062b9e5fa41a2e2c74153f537ac32edc52afb3552de422680a3bf8bad9
3
+ size 14244
scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f690b5a12c96bc867701ba82bb78243d7b91bcd6c93c5ddc786ea740de3b6076
3
+ size 1064
special_tokens_map.json ADDED
@@ -0,0 +1,48 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<|eom|>",
4
+ "<|bom|>",
5
+ "<|system|>",
6
+ "<|user|>",
7
+ "<|assistant|>",
8
+ "<|tool|>",
9
+ "<documents>",
10
+ "</documents>",
11
+ "<tool_definitions>",
12
+ "</tool_definitions>",
13
+ "<active_output_modes>",
14
+ "</active_output_modes>",
15
+ "<citations>",
16
+ "</citations>",
17
+ "<tool_calls>",
18
+ "</tool_calls>"
19
+ ],
20
+ "bos_token": {
21
+ "content": "<|startoftext|>",
22
+ "lstrip": false,
23
+ "normalized": false,
24
+ "rstrip": false,
25
+ "single_word": false
26
+ },
27
+ "eos_token": {
28
+ "content": "<|endoftext|>",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false
33
+ },
34
+ "pad_token": {
35
+ "content": "<|pad|>",
36
+ "lstrip": false,
37
+ "normalized": false,
38
+ "rstrip": false,
39
+ "single_word": false
40
+ },
41
+ "unk_token": {
42
+ "content": "<|unk|>",
43
+ "lstrip": false,
44
+ "normalized": false,
45
+ "rstrip": false,
46
+ "single_word": false
47
+ }
48
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,196 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": true,
3
+ "add_eos_token": false,
4
+ "add_prefix_space": null,
5
+ "added_tokens_decoder": {
6
+ "0": {
7
+ "content": "<|pad|>",
8
+ "lstrip": false,
9
+ "normalized": false,
10
+ "rstrip": false,
11
+ "single_word": false,
12
+ "special": true
13
+ },
14
+ "1": {
15
+ "content": "<|startoftext|>",
16
+ "lstrip": false,
17
+ "normalized": false,
18
+ "rstrip": false,
19
+ "single_word": false,
20
+ "special": true
21
+ },
22
+ "2": {
23
+ "content": "<|endoftext|>",
24
+ "lstrip": false,
25
+ "normalized": false,
26
+ "rstrip": false,
27
+ "single_word": false,
28
+ "special": true
29
+ },
30
+ "3": {
31
+ "content": "<|unk|>",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false,
36
+ "special": true
37
+ },
38
+ "518": {
39
+ "content": "<|eom|>",
40
+ "lstrip": false,
41
+ "normalized": false,
42
+ "rstrip": false,
43
+ "single_word": false,
44
+ "special": true
45
+ },
46
+ "519": {
47
+ "content": "<|bom|>",
48
+ "lstrip": false,
49
+ "normalized": false,
50
+ "rstrip": false,
51
+ "single_word": false,
52
+ "special": true
53
+ },
54
+ "520": {
55
+ "content": "<|system|>",
56
+ "lstrip": false,
57
+ "normalized": false,
58
+ "rstrip": false,
59
+ "single_word": false,
60
+ "special": true
61
+ },
62
+ "521": {
63
+ "content": "<|user|>",
64
+ "lstrip": false,
65
+ "normalized": false,
66
+ "rstrip": false,
67
+ "single_word": false,
68
+ "special": true
69
+ },
70
+ "522": {
71
+ "content": "<|assistant|>",
72
+ "lstrip": false,
73
+ "normalized": false,
74
+ "rstrip": false,
75
+ "single_word": false,
76
+ "special": true
77
+ },
78
+ "523": {
79
+ "content": "<|tool|>",
80
+ "lstrip": false,
81
+ "normalized": false,
82
+ "rstrip": false,
83
+ "single_word": false,
84
+ "special": true
85
+ },
86
+ "524": {
87
+ "content": "<documents>",
88
+ "lstrip": false,
89
+ "normalized": false,
90
+ "rstrip": false,
91
+ "single_word": false,
92
+ "special": true
93
+ },
94
+ "525": {
95
+ "content": "</documents>",
96
+ "lstrip": false,
97
+ "normalized": false,
98
+ "rstrip": false,
99
+ "single_word": false,
100
+ "special": true
101
+ },
102
+ "526": {
103
+ "content": "<tool_definitions>",
104
+ "lstrip": false,
105
+ "normalized": false,
106
+ "rstrip": false,
107
+ "single_word": false,
108
+ "special": true
109
+ },
110
+ "527": {
111
+ "content": "</tool_definitions>",
112
+ "lstrip": false,
113
+ "normalized": false,
114
+ "rstrip": false,
115
+ "single_word": false,
116
+ "special": true
117
+ },
118
+ "528": {
119
+ "content": "<active_output_modes>",
120
+ "lstrip": false,
121
+ "normalized": false,
122
+ "rstrip": false,
123
+ "single_word": false,
124
+ "special": true
125
+ },
126
+ "529": {
127
+ "content": "</active_output_modes>",
128
+ "lstrip": false,
129
+ "normalized": false,
130
+ "rstrip": false,
131
+ "single_word": false,
132
+ "special": true
133
+ },
134
+ "530": {
135
+ "content": "<citations>",
136
+ "lstrip": false,
137
+ "normalized": false,
138
+ "rstrip": false,
139
+ "single_word": false,
140
+ "special": true
141
+ },
142
+ "531": {
143
+ "content": "</citations>",
144
+ "lstrip": false,
145
+ "normalized": false,
146
+ "rstrip": false,
147
+ "single_word": false,
148
+ "special": true
149
+ },
150
+ "532": {
151
+ "content": "<tool_calls>",
152
+ "lstrip": false,
153
+ "normalized": false,
154
+ "rstrip": false,
155
+ "single_word": false,
156
+ "special": true
157
+ },
158
+ "533": {
159
+ "content": "</tool_calls>",
160
+ "lstrip": false,
161
+ "normalized": false,
162
+ "rstrip": false,
163
+ "single_word": false,
164
+ "special": true
165
+ }
166
+ },
167
+ "additional_special_tokens": [
168
+ "<|eom|>",
169
+ "<|bom|>",
170
+ "<|system|>",
171
+ "<|user|>",
172
+ "<|assistant|>",
173
+ "<|tool|>",
174
+ "<documents>",
175
+ "</documents>",
176
+ "<tool_definitions>",
177
+ "</tool_definitions>",
178
+ "<active_output_modes>",
179
+ "</active_output_modes>",
180
+ "<citations>",
181
+ "</citations>",
182
+ "<tool_calls>",
183
+ "</tool_calls>"
184
+ ],
185
+ "bos_token": "<|startoftext|>",
186
+ "chat_template": "{# Variables #}\n{% set ns = namespace(message_count=0, is_last_checked_defined=False) %}\n{##}\n{% set bom_str = bom_str or \"<|bom|>\" %}\n{% set eom_str = eom_str or \"<|eom|>\" %}\n{% set default_system_message = \"\" %}\n{##}\n{% set documents_prefix = \"<documents>\" %}\n{% set documents_suffix = \"</documents>\" %}\n{% set tool_definitions_prefix = \"<tool_definitions>\" %}\n{% set tool_definitions_suffix = \"</tool_definitions>\" %}\n{% set active_modes_prefix = \"<active_output_modes>\" %}\n{% set active_modes_suffix = \"</active_output_modes>\" %}\n{##}\n{% set tool_calls_prefix = \"<tool_calls>\" %}\n{% set tool_calls_suffix = \"</tool_calls>\" %}\n{% set citations_prefix = \"<citations>\" %}\n{% set citations_suffix = \"</citations>\" %}\n{##}\n{% if add_generation_prompt is not defined %}\n {% set add_generation_prompt = True %}\n{% endif %}\n{% set role_to_predict = role_to_predict or \"assistant\" %}\n{% if messages|length > 0 and messages[0].role == \"system\" %}\n {% set system_message = messages[0].content %}\n {% set loop_messages = messages[1:] %}\n{% else %}\n {% set system_message = default_system_message %}\n {% set loop_messages = messages %}\n{% endif %}\n{##}\n{##}\n{# Macros #}\n{% macro handle_tool_definitions(tools) %}\n {{- tool_definitions_prefix -}}\n {{- \"\\n# Tools\" -}}\n {{- \"\\n\\n## Functions\" -}}\n {% for tool in tools %}\n {% set _ = is_param_set(tool, field=\"type\") %}\n {% set is_tool_type_set = ns.is_last_checked_defined %}\n {% if is_tool_type_set %}\n {% if tool.type == \"function\" %}\n {% set tool = tool.function %}\n {% else %}\n {{ raise_exception(\"Currently, the only supported tool type is `function`\") }}\n {% endif %}\n {% endif %}\n {{- \"\\n\\n\" + (tool|tojson(indent=2)) -}}\n {% endfor %}\n {{- \"\\n\" + tool_definitions_suffix -}}\n{% endmacro %}\n{##}\n{% macro handle_first_system_message(system_message, tools) %}\n {{- bom_str + handle_role(\"system\") -}}\n {% set _ = is_param_set(system_message) %}\n {% set is_system_message_set = ns.is_last_checked_defined %}\n {% if is_system_message_set %}\n {{- system_message -}}\n {% endif %}\n {% set _ = is_param_set(tools, is_list=True) %}\n {% set is_tools_set = ns.is_last_checked_defined %}\n {% if is_tools_set %}\n {% if system_message %}\n {{- \"\\n\\n\" -}}\n {% endif %}\n {{- handle_tool_definitions(tools) -}}\n {% endif %}\n {% set ns.message_count = ns.message_count + 1 %}\n{% endmacro %}\n{##}\n{% macro handle_tool_calls(tool_calls) %}\n {{- tool_calls_prefix + \"[\\n\" -}}\n {% for tool_call in tool_calls %}\n {% set _ = is_param_set(tool_call, field=\"function\") %}\n {% set is_tool_call_function_set = ns.is_last_checked_defined %}\n {% if is_tool_call_function_set %}\n {%- set tool_call = tool_call.function %}\n {%- endif %}\n {% set arguments = tool_call.arguments %}\n {% if arguments is not string %}\n {%- set arguments = arguments|tojson -%}\n {%- endif %}\n {{ \"{\\\"name\\\": \\\"\" + tool_call.name + \"\\\", \\\"arguments\\\": \" + arguments + \"}\" -}}\n {% if not loop.last %}\n {{- \",\" }}\n {% endif %}\n {% endfor %}\n {{- \"\\n]\" + tool_calls_suffix -}}\n{% endmacro %}\n{##}\n{% macro handle_documents(documents) %}\n {{- documents_prefix -}}\n {{- \"\\n# Documents\" -}}\n {{- \"\\n\\nYou can use the following documents for reference:\" -}}\n {% for doc in documents %}\n {{- \"\\n\\n## Document ID: \" + loop.index0|string -}}\n {% set _ = is_param_set(doc, field=\"title\") %}\n {% set is_doc_title_set = ns.is_last_checked_defined %}\n {% if is_doc_title_set %}\n {{- \"\\nTitle: \" + doc.title -}}\n {% endif %}\n {% for key, value in doc.items() %}\n {% if key not in [\"title\", \"text\"] %}\n {{- \"\\n\" + key|title + \": \" + value|string -}}\n {% endif %}\n {% endfor %}\n {{- \"\\nText: \" + doc.text -}}\n {% endfor %}\n {{- \"\\n\" + documents_suffix -}}\n{% endmacro %}\n{##}\n{% macro handle_knobs(knobs) %}\n {{- active_modes_prefix -}}\n {{- \"\\n# Active Modes\" -}}\n {{ \"\\n\\nThe following modes configure the format or style of your responses. You should adhere to all currently\" -}}\n {{ \" active modes simultaneously.\" -}}\n {% if knobs.citation_mode == \"fast\" %}\n {{- \"\\n\\n## Citation Mode\" -}}\n {{- \"\\n\\nProvide a list of references only for the documents you base your response on. Format your response\" -}}\n {{ \" with the original answer followed by a citation section. Use this template:\" -}}\n {{ \" `{answer}\" + citations_prefix + \"DOCUMENT_IDS\" + citations_suffix + \"`, where DOCUMENT_IDS are the relevant document numbers\" -}}\n {{ \" (e.g. [2, 5, 9]), or [] if the answer cannot be supported by the provided documents.\" -}}\n {% endif %}\n {% if knobs.response_format == \"json_object\" %}\n {{- \"\\n\\n## JSON Mode\" -}}\n {{ \"\\n\\nProvide your response in JSON format. Adhere strictly to any schema given by the user.\" -}}\n {{ \" If an appropriate JSON format exists, use it without modification.\" -}}\n {% endif %}\n {{- \"\\n\" + active_modes_suffix -}}\n{% endmacro %}\n{##}\n{% macro get_last_user_index(messages) %}\n {% set ns.last_user_index = 0 %}\n {% for message in messages %}\n {% if message.role == 'user' %}\n {% set ns.last_user_index = loop.index0 %}\n {% endif %}\n {% endfor %}\n {{- ns.last_user_index -}}\n{% endmacro %}\n{##}\n{% macro handle_last_system_message(documents, knobs, use_documents, use_knobs) %}\n {{- bom_str + handle_role(\"system\") -}}\n {% set macros_to_call = [] %}\n {% set params_for_macros = [] %}\n {% if use_documents %}\n {% set macros_to_call = macros_to_call + [handle_documents] %}\n {% set params_for_macros = params_for_macros + [[documents]] %}\n {% endif %}\n {% if use_knobs %}\n {% set macros_to_call = macros_to_call + [handle_knobs] %}\n {% set params_for_macros = params_for_macros + [[knobs]] %}\n {% endif %}\n {% for i in range(macros_to_call|length) %}\n {% if i > 0 %}\n {{- \"\\n\\n\" -}}\n {% endif %}\n {{- macros_to_call[i](*params_for_macros[i]) -}}\n {% endfor %}\n {% set ns.message_count = ns.message_count + 1 %}\n{% endmacro %}\n{##}\n{% macro handle_role(role, add_space=True) %}\n {{- \"<|\" + role + \"|>\" -}}\n {% if add_space %}\n {{- \" \" -}}\n {% endif %}\n{% endmacro %}\n{##}\n{% macro is_param_set(param, field=none, is_list=False) %}\n {% if field is not none %}\n {% if field in param %}\n {% set param = param[field] %}\n {% else %}\n {% set param = none %}\n {% endif %}\n {% endif %}\n {% set is_defined = param is defined and param is not none %}\n {% if is_list %}\n {% set ns.is_last_checked_defined = is_defined and param|length > 0 %}\n {% else %}\n {% set ns.is_last_checked_defined = is_defined %}\n {% endif %}\n{% endmacro %}\n{##}\n{##}\n{# Template #}\n{{- \"<|startoftext|>\" -}}\n{% set _ = is_param_set(system_message) %}\n{% set is_system_message_set = ns.is_last_checked_defined %}\n{% set _ = is_param_set(tools, is_list=True) %}\n{% set is_tools_set = ns.is_last_checked_defined %}\n{% set has_system_message = (is_system_message_set or is_tools_set) %}\n{% if has_system_message %}\n {{- handle_first_system_message(system_message, tools) -}}\n{% endif %}\n{% set last_user_index = get_last_user_index(loop_messages)|int %}\n{% for message in loop_messages %}\n {% if loop.index0 == last_user_index %}\n {% set _ = is_param_set(documents, is_list=True) %}\n {% set use_documents = ns.is_last_checked_defined %}\n {% set _ = is_param_set(knobs) %}\n {% set use_knobs = ns.is_last_checked_defined and knobs.is_set %}\n {% set add_last_system_message = use_documents or use_knobs %}\n {% if add_last_system_message %}\n {% if ns.message_count > 0 %}\n {{- eom_str -}}\n {% endif %}\n {{- handle_last_system_message(documents, knobs, use_documents, use_knobs) -}}\n {% endif %}\n {% endif %}\n {% set role = message.role %}\n {% set _ = is_param_set(message, field=\"name\") %}\n {% set is_message_name_set = ns.is_last_checked_defined %}\n {% if is_message_name_set %}\n {% set message_prefix = handle_role(role) + \"(\" + message.name + \")\" %}\n {% else %}\n {% set message_prefix = handle_role(role) %}\n {% endif %}\n {% set content = (message.content or \"\") %}\n {% if content is not string %}\n {% set content = content|tojson %}\n {% endif %}\n {% if ns.message_count > 0 %}\n {{- eom_str -}}\n {% endif %}\n {{- bom_str + message_prefix + content -}}\n {% set _ = is_param_set(message, field=\"tool_calls\", is_list=True) %}\n {% set is_tool_calls_set = ns.is_last_checked_defined %}\n {% if role == \"assistant\" and is_tool_calls_set %}\n {{- handle_tool_calls(message.tool_calls) -}}\n {% endif %}\n {% set _ = is_param_set(message, field=\"citations\", is_list=True) %}\n {% set is_citations_set = ns.is_last_checked_defined %}\n {% if role == \"assistant\" and is_citations_set %}\n {{- citations_prefix + message.citations|map(attribute=\"document_id\")|list|string + citations_suffix -}}\n {% endif %}\n {% set ns.message_count = ns.message_count + 1 %}\n{% endfor %}\n{% if add_generation_prompt %}\n {% if ns.message_count > 0 %}\n {{- eom_str -}}\n {% endif %}\n {{- bom_str + handle_role(role_to_predict, add_space=False) -}}\n {% set _ = is_param_set(generation_preamble) %}\n {% set is_generation_preamble_set = ns.is_last_checked_defined %}\n {% if is_generation_preamble_set and generation_preamble.strip() != \"\" %}\n {{- \" \" + generation_preamble -}}\n {% endif %}\n {% set ns.message_count = ns.message_count + 1 %}\n{% else %}\n {% if ns.message_count > 0 %}\n {{- eom_str -}}\n {% endif %}\n{% endif %}\n",
187
+ "clean_up_tokenization_spaces": false,
188
+ "eos_token": "<|endoftext|>",
189
+ "legacy": true,
190
+ "model_max_length": 1000000000000000019884624838656,
191
+ "pad_token": "<|pad|>",
192
+ "spaces_between_special_tokens": false,
193
+ "tokenizer_class": "LlamaTokenizer",
194
+ "unk_token": "<|unk|>",
195
+ "use_default_system_prompt": false
196
+ }
trainer_state.json ADDED
@@ -0,0 +1,969 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": null,
3
+ "best_model_checkpoint": null,
4
+ "epoch": 0.11943863839952225,
5
+ "eval_steps": 100,
6
+ "global_step": 1200,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.0009953219866626853,
13
+ "grad_norm": 1.912980556488037,
14
+ "learning_rate": 9.995023390066686e-06,
15
+ "loss": 1.8703,
16
+ "step": 10
17
+ },
18
+ {
19
+ "epoch": 0.0019906439733253707,
20
+ "grad_norm": 1.866821050643921,
21
+ "learning_rate": 9.990046780133374e-06,
22
+ "loss": 1.8723,
23
+ "step": 20
24
+ },
25
+ {
26
+ "epoch": 0.002985965959988056,
27
+ "grad_norm": 2.058809280395508,
28
+ "learning_rate": 9.985070170200061e-06,
29
+ "loss": 1.8097,
30
+ "step": 30
31
+ },
32
+ {
33
+ "epoch": 0.003981287946650741,
34
+ "grad_norm": 1.459013819694519,
35
+ "learning_rate": 9.980093560266747e-06,
36
+ "loss": 1.7456,
37
+ "step": 40
38
+ },
39
+ {
40
+ "epoch": 0.004976609933313427,
41
+ "grad_norm": 0.9095586538314819,
42
+ "learning_rate": 9.975116950333434e-06,
43
+ "loss": 1.7195,
44
+ "step": 50
45
+ },
46
+ {
47
+ "epoch": 0.005971931919976112,
48
+ "grad_norm": 1.1065226793289185,
49
+ "learning_rate": 9.970140340400121e-06,
50
+ "loss": 1.6502,
51
+ "step": 60
52
+ },
53
+ {
54
+ "epoch": 0.0069672539066387975,
55
+ "grad_norm": 0.8301252126693726,
56
+ "learning_rate": 9.965163730466807e-06,
57
+ "loss": 1.5699,
58
+ "step": 70
59
+ },
60
+ {
61
+ "epoch": 0.007962575893301483,
62
+ "grad_norm": 1.0762828588485718,
63
+ "learning_rate": 9.960187120533493e-06,
64
+ "loss": 1.5072,
65
+ "step": 80
66
+ },
67
+ {
68
+ "epoch": 0.008957897879964169,
69
+ "grad_norm": 1.0814900398254395,
70
+ "learning_rate": 9.95521051060018e-06,
71
+ "loss": 1.4369,
72
+ "step": 90
73
+ },
74
+ {
75
+ "epoch": 0.009953219866626855,
76
+ "grad_norm": 1.3561326265335083,
77
+ "learning_rate": 9.950233900666867e-06,
78
+ "loss": 1.3467,
79
+ "step": 100
80
+ },
81
+ {
82
+ "epoch": 0.009953219866626855,
83
+ "eval_loss": 1.2846794128417969,
84
+ "eval_runtime": 147.6242,
85
+ "eval_samples_per_second": 1.375,
86
+ "eval_steps_per_second": 0.691,
87
+ "step": 100
88
+ },
89
+ {
90
+ "epoch": 0.010948541853289539,
91
+ "grad_norm": 1.438547968864441,
92
+ "learning_rate": 9.945257290733553e-06,
93
+ "loss": 1.2222,
94
+ "step": 110
95
+ },
96
+ {
97
+ "epoch": 0.011943863839952225,
98
+ "grad_norm": 1.402588963508606,
99
+ "learning_rate": 9.94028068080024e-06,
100
+ "loss": 1.1001,
101
+ "step": 120
102
+ },
103
+ {
104
+ "epoch": 0.012939185826614909,
105
+ "grad_norm": 1.4357985258102417,
106
+ "learning_rate": 9.935304070866926e-06,
107
+ "loss": 0.9657,
108
+ "step": 130
109
+ },
110
+ {
111
+ "epoch": 0.013934507813277595,
112
+ "grad_norm": 2.137953042984009,
113
+ "learning_rate": 9.930327460933613e-06,
114
+ "loss": 0.8211,
115
+ "step": 140
116
+ },
117
+ {
118
+ "epoch": 0.014929829799940281,
119
+ "grad_norm": 1.374299168586731,
120
+ "learning_rate": 9.925350851000299e-06,
121
+ "loss": 0.7142,
122
+ "step": 150
123
+ },
124
+ {
125
+ "epoch": 0.015925151786602965,
126
+ "grad_norm": 1.1510456800460815,
127
+ "learning_rate": 9.920374241066986e-06,
128
+ "loss": 0.656,
129
+ "step": 160
130
+ },
131
+ {
132
+ "epoch": 0.01692047377326565,
133
+ "grad_norm": 1.0226788520812988,
134
+ "learning_rate": 9.915397631133673e-06,
135
+ "loss": 0.6212,
136
+ "step": 170
137
+ },
138
+ {
139
+ "epoch": 0.017915795759928337,
140
+ "grad_norm": 0.9365411400794983,
141
+ "learning_rate": 9.910421021200359e-06,
142
+ "loss": 0.6069,
143
+ "step": 180
144
+ },
145
+ {
146
+ "epoch": 0.018911117746591023,
147
+ "grad_norm": 0.6880003213882446,
148
+ "learning_rate": 9.905444411267046e-06,
149
+ "loss": 0.6128,
150
+ "step": 190
151
+ },
152
+ {
153
+ "epoch": 0.01990643973325371,
154
+ "grad_norm": 1.1190361976623535,
155
+ "learning_rate": 9.900467801333732e-06,
156
+ "loss": 0.5426,
157
+ "step": 200
158
+ },
159
+ {
160
+ "epoch": 0.01990643973325371,
161
+ "eval_loss": 0.5788590908050537,
162
+ "eval_runtime": 147.511,
163
+ "eval_samples_per_second": 1.376,
164
+ "eval_steps_per_second": 0.691,
165
+ "step": 200
166
+ },
167
+ {
168
+ "epoch": 0.02090176171991639,
169
+ "grad_norm": 1.184279441833496,
170
+ "learning_rate": 9.895491191400419e-06,
171
+ "loss": 0.5887,
172
+ "step": 210
173
+ },
174
+ {
175
+ "epoch": 0.021897083706579078,
176
+ "grad_norm": 0.7627615928649902,
177
+ "learning_rate": 9.890514581467106e-06,
178
+ "loss": 0.5433,
179
+ "step": 220
180
+ },
181
+ {
182
+ "epoch": 0.022892405693241764,
183
+ "grad_norm": 0.7858164310455322,
184
+ "learning_rate": 9.885537971533792e-06,
185
+ "loss": 0.5843,
186
+ "step": 230
187
+ },
188
+ {
189
+ "epoch": 0.02388772767990445,
190
+ "grad_norm": 0.695697009563446,
191
+ "learning_rate": 9.880561361600478e-06,
192
+ "loss": 0.5365,
193
+ "step": 240
194
+ },
195
+ {
196
+ "epoch": 0.024883049666567136,
197
+ "grad_norm": 0.8994197845458984,
198
+ "learning_rate": 9.875584751667165e-06,
199
+ "loss": 0.5662,
200
+ "step": 250
201
+ },
202
+ {
203
+ "epoch": 0.025878371653229818,
204
+ "grad_norm": 0.8016309142112732,
205
+ "learning_rate": 9.870608141733852e-06,
206
+ "loss": 0.5592,
207
+ "step": 260
208
+ },
209
+ {
210
+ "epoch": 0.026873693639892504,
211
+ "grad_norm": 0.8534384369850159,
212
+ "learning_rate": 9.865631531800538e-06,
213
+ "loss": 0.5248,
214
+ "step": 270
215
+ },
216
+ {
217
+ "epoch": 0.02786901562655519,
218
+ "grad_norm": 0.9857029914855957,
219
+ "learning_rate": 9.860654921867225e-06,
220
+ "loss": 0.5294,
221
+ "step": 280
222
+ },
223
+ {
224
+ "epoch": 0.028864337613217876,
225
+ "grad_norm": 0.7766090631484985,
226
+ "learning_rate": 9.855678311933912e-06,
227
+ "loss": 0.5198,
228
+ "step": 290
229
+ },
230
+ {
231
+ "epoch": 0.029859659599880562,
232
+ "grad_norm": 0.6832401752471924,
233
+ "learning_rate": 9.850701702000598e-06,
234
+ "loss": 0.5844,
235
+ "step": 300
236
+ },
237
+ {
238
+ "epoch": 0.029859659599880562,
239
+ "eval_loss": 0.536589503288269,
240
+ "eval_runtime": 147.4968,
241
+ "eval_samples_per_second": 1.376,
242
+ "eval_steps_per_second": 0.692,
243
+ "step": 300
244
+ },
245
+ {
246
+ "epoch": 0.030854981586543248,
247
+ "grad_norm": 0.7720848917961121,
248
+ "learning_rate": 9.845725092067284e-06,
249
+ "loss": 0.5365,
250
+ "step": 310
251
+ },
252
+ {
253
+ "epoch": 0.03185030357320593,
254
+ "grad_norm": 0.7022100687026978,
255
+ "learning_rate": 9.840748482133971e-06,
256
+ "loss": 0.4841,
257
+ "step": 320
258
+ },
259
+ {
260
+ "epoch": 0.03284562555986862,
261
+ "grad_norm": 1.0030310153961182,
262
+ "learning_rate": 9.835771872200658e-06,
263
+ "loss": 0.4635,
264
+ "step": 330
265
+ },
266
+ {
267
+ "epoch": 0.0338409475465313,
268
+ "grad_norm": 0.8628882765769958,
269
+ "learning_rate": 9.830795262267344e-06,
270
+ "loss": 0.4932,
271
+ "step": 340
272
+ },
273
+ {
274
+ "epoch": 0.034836269533193985,
275
+ "grad_norm": 0.7178316712379456,
276
+ "learning_rate": 9.825818652334031e-06,
277
+ "loss": 0.6057,
278
+ "step": 350
279
+ },
280
+ {
281
+ "epoch": 0.035831591519856675,
282
+ "grad_norm": 0.9564626216888428,
283
+ "learning_rate": 9.820842042400718e-06,
284
+ "loss": 0.5371,
285
+ "step": 360
286
+ },
287
+ {
288
+ "epoch": 0.03682691350651936,
289
+ "grad_norm": 0.7041760683059692,
290
+ "learning_rate": 9.815865432467404e-06,
291
+ "loss": 0.513,
292
+ "step": 370
293
+ },
294
+ {
295
+ "epoch": 0.037822235493182046,
296
+ "grad_norm": 1.0203750133514404,
297
+ "learning_rate": 9.81088882253409e-06,
298
+ "loss": 0.5118,
299
+ "step": 380
300
+ },
301
+ {
302
+ "epoch": 0.03881755747984473,
303
+ "grad_norm": 0.8765382170677185,
304
+ "learning_rate": 9.805912212600777e-06,
305
+ "loss": 0.4529,
306
+ "step": 390
307
+ },
308
+ {
309
+ "epoch": 0.03981287946650742,
310
+ "grad_norm": 0.9951983690261841,
311
+ "learning_rate": 9.800935602667464e-06,
312
+ "loss": 0.5336,
313
+ "step": 400
314
+ },
315
+ {
316
+ "epoch": 0.03981287946650742,
317
+ "eval_loss": 0.5151349306106567,
318
+ "eval_runtime": 147.6615,
319
+ "eval_samples_per_second": 1.375,
320
+ "eval_steps_per_second": 0.691,
321
+ "step": 400
322
+ },
323
+ {
324
+ "epoch": 0.0408082014531701,
325
+ "grad_norm": 0.7691435813903809,
326
+ "learning_rate": 9.79595899273415e-06,
327
+ "loss": 0.506,
328
+ "step": 410
329
+ },
330
+ {
331
+ "epoch": 0.04180352343983278,
332
+ "grad_norm": 1.1955533027648926,
333
+ "learning_rate": 9.790982382800837e-06,
334
+ "loss": 0.4692,
335
+ "step": 420
336
+ },
337
+ {
338
+ "epoch": 0.04279884542649547,
339
+ "grad_norm": 1.128085732460022,
340
+ "learning_rate": 9.786005772867525e-06,
341
+ "loss": 0.4608,
342
+ "step": 430
343
+ },
344
+ {
345
+ "epoch": 0.043794167413158155,
346
+ "grad_norm": 0.5518949627876282,
347
+ "learning_rate": 9.78102916293421e-06,
348
+ "loss": 0.5006,
349
+ "step": 440
350
+ },
351
+ {
352
+ "epoch": 0.044789489399820845,
353
+ "grad_norm": 0.7164484858512878,
354
+ "learning_rate": 9.776052553000896e-06,
355
+ "loss": 0.4996,
356
+ "step": 450
357
+ },
358
+ {
359
+ "epoch": 0.04578481138648353,
360
+ "grad_norm": 0.5959630012512207,
361
+ "learning_rate": 9.771075943067583e-06,
362
+ "loss": 0.4843,
363
+ "step": 460
364
+ },
365
+ {
366
+ "epoch": 0.04678013337314621,
367
+ "grad_norm": 0.743648111820221,
368
+ "learning_rate": 9.76609933313427e-06,
369
+ "loss": 0.4363,
370
+ "step": 470
371
+ },
372
+ {
373
+ "epoch": 0.0477754553598089,
374
+ "grad_norm": 0.8757079243659973,
375
+ "learning_rate": 9.761122723200956e-06,
376
+ "loss": 0.4665,
377
+ "step": 480
378
+ },
379
+ {
380
+ "epoch": 0.04877077734647158,
381
+ "grad_norm": 1.0122153759002686,
382
+ "learning_rate": 9.756146113267643e-06,
383
+ "loss": 0.492,
384
+ "step": 490
385
+ },
386
+ {
387
+ "epoch": 0.04976609933313427,
388
+ "grad_norm": 0.6179729700088501,
389
+ "learning_rate": 9.751169503334329e-06,
390
+ "loss": 0.5022,
391
+ "step": 500
392
+ },
393
+ {
394
+ "epoch": 0.04976609933313427,
395
+ "eval_loss": 0.4993921220302582,
396
+ "eval_runtime": 147.7401,
397
+ "eval_samples_per_second": 1.374,
398
+ "eval_steps_per_second": 0.69,
399
+ "step": 500
400
+ },
401
+ {
402
+ "epoch": 0.050761421319796954,
403
+ "grad_norm": 0.952812671661377,
404
+ "learning_rate": 9.746192893401016e-06,
405
+ "loss": 0.4901,
406
+ "step": 510
407
+ },
408
+ {
409
+ "epoch": 0.051756743306459636,
410
+ "grad_norm": 0.6715916991233826,
411
+ "learning_rate": 9.741216283467702e-06,
412
+ "loss": 0.5055,
413
+ "step": 520
414
+ },
415
+ {
416
+ "epoch": 0.052752065293122326,
417
+ "grad_norm": 0.674640953540802,
418
+ "learning_rate": 9.736239673534389e-06,
419
+ "loss": 0.4874,
420
+ "step": 530
421
+ },
422
+ {
423
+ "epoch": 0.05374738727978501,
424
+ "grad_norm": 0.7867962718009949,
425
+ "learning_rate": 9.731263063601075e-06,
426
+ "loss": 0.4956,
427
+ "step": 540
428
+ },
429
+ {
430
+ "epoch": 0.0547427092664477,
431
+ "grad_norm": 0.9035332202911377,
432
+ "learning_rate": 9.726286453667762e-06,
433
+ "loss": 0.499,
434
+ "step": 550
435
+ },
436
+ {
437
+ "epoch": 0.05573803125311038,
438
+ "grad_norm": 0.7009295225143433,
439
+ "learning_rate": 9.72130984373445e-06,
440
+ "loss": 0.5034,
441
+ "step": 560
442
+ },
443
+ {
444
+ "epoch": 0.05673335323977307,
445
+ "grad_norm": 0.7018862366676331,
446
+ "learning_rate": 9.716333233801135e-06,
447
+ "loss": 0.5137,
448
+ "step": 570
449
+ },
450
+ {
451
+ "epoch": 0.05772867522643575,
452
+ "grad_norm": 0.7812825441360474,
453
+ "learning_rate": 9.711356623867822e-06,
454
+ "loss": 0.4724,
455
+ "step": 580
456
+ },
457
+ {
458
+ "epoch": 0.058723997213098435,
459
+ "grad_norm": 0.6245225071907043,
460
+ "learning_rate": 9.70638001393451e-06,
461
+ "loss": 0.4446,
462
+ "step": 590
463
+ },
464
+ {
465
+ "epoch": 0.059719319199761124,
466
+ "grad_norm": 0.9083976149559021,
467
+ "learning_rate": 9.701403404001195e-06,
468
+ "loss": 0.4884,
469
+ "step": 600
470
+ },
471
+ {
472
+ "epoch": 0.059719319199761124,
473
+ "eval_loss": 0.4891846477985382,
474
+ "eval_runtime": 147.5284,
475
+ "eval_samples_per_second": 1.376,
476
+ "eval_steps_per_second": 0.691,
477
+ "step": 600
478
+ },
479
+ {
480
+ "epoch": 0.06071464118642381,
481
+ "grad_norm": 0.6195352673530579,
482
+ "learning_rate": 9.69642679406788e-06,
483
+ "loss": 0.5121,
484
+ "step": 610
485
+ },
486
+ {
487
+ "epoch": 0.061709963173086496,
488
+ "grad_norm": 0.8068727254867554,
489
+ "learning_rate": 9.691450184134568e-06,
490
+ "loss": 0.4689,
491
+ "step": 620
492
+ },
493
+ {
494
+ "epoch": 0.06270528515974919,
495
+ "grad_norm": 1.0427749156951904,
496
+ "learning_rate": 9.686473574201255e-06,
497
+ "loss": 0.4968,
498
+ "step": 630
499
+ },
500
+ {
501
+ "epoch": 0.06370060714641186,
502
+ "grad_norm": 0.698349118232727,
503
+ "learning_rate": 9.681496964267941e-06,
504
+ "loss": 0.4691,
505
+ "step": 640
506
+ },
507
+ {
508
+ "epoch": 0.06469592913307455,
509
+ "grad_norm": 0.9104384183883667,
510
+ "learning_rate": 9.676520354334628e-06,
511
+ "loss": 0.4775,
512
+ "step": 650
513
+ },
514
+ {
515
+ "epoch": 0.06569125111973724,
516
+ "grad_norm": 0.8729726076126099,
517
+ "learning_rate": 9.671543744401316e-06,
518
+ "loss": 0.5201,
519
+ "step": 660
520
+ },
521
+ {
522
+ "epoch": 0.06668657310639992,
523
+ "grad_norm": 0.9858236908912659,
524
+ "learning_rate": 9.666567134468001e-06,
525
+ "loss": 0.4268,
526
+ "step": 670
527
+ },
528
+ {
529
+ "epoch": 0.0676818950930626,
530
+ "grad_norm": 2.322754383087158,
531
+ "learning_rate": 9.661590524534687e-06,
532
+ "loss": 0.4744,
533
+ "step": 680
534
+ },
535
+ {
536
+ "epoch": 0.0686772170797253,
537
+ "grad_norm": 0.9327623248100281,
538
+ "learning_rate": 9.656613914601374e-06,
539
+ "loss": 0.4355,
540
+ "step": 690
541
+ },
542
+ {
543
+ "epoch": 0.06967253906638797,
544
+ "grad_norm": 0.6949413418769836,
545
+ "learning_rate": 9.651637304668062e-06,
546
+ "loss": 0.465,
547
+ "step": 700
548
+ },
549
+ {
550
+ "epoch": 0.06967253906638797,
551
+ "eval_loss": 0.4817120432853699,
552
+ "eval_runtime": 147.5643,
553
+ "eval_samples_per_second": 1.376,
554
+ "eval_steps_per_second": 0.691,
555
+ "step": 700
556
+ },
557
+ {
558
+ "epoch": 0.07066786105305066,
559
+ "grad_norm": 0.5208165049552917,
560
+ "learning_rate": 9.646660694734747e-06,
561
+ "loss": 0.4973,
562
+ "step": 710
563
+ },
564
+ {
565
+ "epoch": 0.07166318303971335,
566
+ "grad_norm": 0.8434884548187256,
567
+ "learning_rate": 9.641684084801434e-06,
568
+ "loss": 0.4721,
569
+ "step": 720
570
+ },
571
+ {
572
+ "epoch": 0.07265850502637604,
573
+ "grad_norm": 0.7161769866943359,
574
+ "learning_rate": 9.636707474868122e-06,
575
+ "loss": 0.498,
576
+ "step": 730
577
+ },
578
+ {
579
+ "epoch": 0.07365382701303871,
580
+ "grad_norm": 0.7036088705062866,
581
+ "learning_rate": 9.631730864934807e-06,
582
+ "loss": 0.4672,
583
+ "step": 740
584
+ },
585
+ {
586
+ "epoch": 0.0746491489997014,
587
+ "grad_norm": 0.9175013899803162,
588
+ "learning_rate": 9.626754255001493e-06,
589
+ "loss": 0.4781,
590
+ "step": 750
591
+ },
592
+ {
593
+ "epoch": 0.07564447098636409,
594
+ "grad_norm": 0.678519606590271,
595
+ "learning_rate": 9.62177764506818e-06,
596
+ "loss": 0.4048,
597
+ "step": 760
598
+ },
599
+ {
600
+ "epoch": 0.07663979297302677,
601
+ "grad_norm": 0.6295528411865234,
602
+ "learning_rate": 9.616801035134868e-06,
603
+ "loss": 0.449,
604
+ "step": 770
605
+ },
606
+ {
607
+ "epoch": 0.07763511495968946,
608
+ "grad_norm": 0.5424385666847229,
609
+ "learning_rate": 9.611824425201553e-06,
610
+ "loss": 0.4394,
611
+ "step": 780
612
+ },
613
+ {
614
+ "epoch": 0.07863043694635215,
615
+ "grad_norm": 0.508836030960083,
616
+ "learning_rate": 9.60684781526824e-06,
617
+ "loss": 0.4317,
618
+ "step": 790
619
+ },
620
+ {
621
+ "epoch": 0.07962575893301484,
622
+ "grad_norm": 0.6004147529602051,
623
+ "learning_rate": 9.601871205334926e-06,
624
+ "loss": 0.4308,
625
+ "step": 800
626
+ },
627
+ {
628
+ "epoch": 0.07962575893301484,
629
+ "eval_loss": 0.47557342052459717,
630
+ "eval_runtime": 147.5812,
631
+ "eval_samples_per_second": 1.376,
632
+ "eval_steps_per_second": 0.691,
633
+ "step": 800
634
+ },
635
+ {
636
+ "epoch": 0.08062108091967751,
637
+ "grad_norm": 0.5553786754608154,
638
+ "learning_rate": 9.596894595401613e-06,
639
+ "loss": 0.4376,
640
+ "step": 810
641
+ },
642
+ {
643
+ "epoch": 0.0816164029063402,
644
+ "grad_norm": 0.7254445552825928,
645
+ "learning_rate": 9.591917985468299e-06,
646
+ "loss": 0.4884,
647
+ "step": 820
648
+ },
649
+ {
650
+ "epoch": 0.08261172489300289,
651
+ "grad_norm": 0.7175013422966003,
652
+ "learning_rate": 9.586941375534986e-06,
653
+ "loss": 0.4167,
654
+ "step": 830
655
+ },
656
+ {
657
+ "epoch": 0.08360704687966557,
658
+ "grad_norm": 0.6464620232582092,
659
+ "learning_rate": 9.581964765601674e-06,
660
+ "loss": 0.4622,
661
+ "step": 840
662
+ },
663
+ {
664
+ "epoch": 0.08460236886632826,
665
+ "grad_norm": 0.6999176144599915,
666
+ "learning_rate": 9.57698815566836e-06,
667
+ "loss": 0.4708,
668
+ "step": 850
669
+ },
670
+ {
671
+ "epoch": 0.08559769085299095,
672
+ "grad_norm": 0.7939727306365967,
673
+ "learning_rate": 9.572011545735047e-06,
674
+ "loss": 0.4633,
675
+ "step": 860
676
+ },
677
+ {
678
+ "epoch": 0.08659301283965362,
679
+ "grad_norm": 0.473017156124115,
680
+ "learning_rate": 9.567034935801732e-06,
681
+ "loss": 0.4585,
682
+ "step": 870
683
+ },
684
+ {
685
+ "epoch": 0.08758833482631631,
686
+ "grad_norm": 0.7265183329582214,
687
+ "learning_rate": 9.56205832586842e-06,
688
+ "loss": 0.4485,
689
+ "step": 880
690
+ },
691
+ {
692
+ "epoch": 0.088583656812979,
693
+ "grad_norm": 0.539735734462738,
694
+ "learning_rate": 9.557081715935105e-06,
695
+ "loss": 0.475,
696
+ "step": 890
697
+ },
698
+ {
699
+ "epoch": 0.08957897879964169,
700
+ "grad_norm": 0.7587076425552368,
701
+ "learning_rate": 9.552105106001792e-06,
702
+ "loss": 0.4347,
703
+ "step": 900
704
+ },
705
+ {
706
+ "epoch": 0.08957897879964169,
707
+ "eval_loss": 0.4690374732017517,
708
+ "eval_runtime": 147.5672,
709
+ "eval_samples_per_second": 1.376,
710
+ "eval_steps_per_second": 0.691,
711
+ "step": 900
712
+ },
713
+ {
714
+ "epoch": 0.09057430078630437,
715
+ "grad_norm": 0.7549741864204407,
716
+ "learning_rate": 9.547128496068478e-06,
717
+ "loss": 0.4434,
718
+ "step": 910
719
+ },
720
+ {
721
+ "epoch": 0.09156962277296705,
722
+ "grad_norm": 0.686689555644989,
723
+ "learning_rate": 9.542151886135165e-06,
724
+ "loss": 0.4052,
725
+ "step": 920
726
+ },
727
+ {
728
+ "epoch": 0.09256494475962974,
729
+ "grad_norm": 1.02870512008667,
730
+ "learning_rate": 9.537175276201853e-06,
731
+ "loss": 0.4806,
732
+ "step": 930
733
+ },
734
+ {
735
+ "epoch": 0.09356026674629242,
736
+ "grad_norm": 0.7680675983428955,
737
+ "learning_rate": 9.532198666268538e-06,
738
+ "loss": 0.4609,
739
+ "step": 940
740
+ },
741
+ {
742
+ "epoch": 0.09455558873295511,
743
+ "grad_norm": 0.5478435754776001,
744
+ "learning_rate": 9.527222056335224e-06,
745
+ "loss": 0.4171,
746
+ "step": 950
747
+ },
748
+ {
749
+ "epoch": 0.0955509107196178,
750
+ "grad_norm": 0.5974985361099243,
751
+ "learning_rate": 9.522245446401913e-06,
752
+ "loss": 0.4686,
753
+ "step": 960
754
+ },
755
+ {
756
+ "epoch": 0.09654623270628049,
757
+ "grad_norm": 0.997151792049408,
758
+ "learning_rate": 9.517268836468598e-06,
759
+ "loss": 0.4676,
760
+ "step": 970
761
+ },
762
+ {
763
+ "epoch": 0.09754155469294316,
764
+ "grad_norm": 0.6366075277328491,
765
+ "learning_rate": 9.512292226535284e-06,
766
+ "loss": 0.4467,
767
+ "step": 980
768
+ },
769
+ {
770
+ "epoch": 0.09853687667960585,
771
+ "grad_norm": 0.5682553052902222,
772
+ "learning_rate": 9.507315616601971e-06,
773
+ "loss": 0.4772,
774
+ "step": 990
775
+ },
776
+ {
777
+ "epoch": 0.09953219866626854,
778
+ "grad_norm": 0.5869882106781006,
779
+ "learning_rate": 9.502339006668659e-06,
780
+ "loss": 0.3976,
781
+ "step": 1000
782
+ },
783
+ {
784
+ "epoch": 0.09953219866626854,
785
+ "eval_loss": 0.46156319975852966,
786
+ "eval_runtime": 147.6656,
787
+ "eval_samples_per_second": 1.375,
788
+ "eval_steps_per_second": 0.691,
789
+ "step": 1000
790
+ },
791
+ {
792
+ "epoch": 0.10052752065293122,
793
+ "grad_norm": 0.5758237838745117,
794
+ "learning_rate": 9.497362396735344e-06,
795
+ "loss": 0.4528,
796
+ "step": 1010
797
+ },
798
+ {
799
+ "epoch": 0.10152284263959391,
800
+ "grad_norm": 0.700281023979187,
801
+ "learning_rate": 9.492385786802032e-06,
802
+ "loss": 0.4545,
803
+ "step": 1020
804
+ },
805
+ {
806
+ "epoch": 0.1025181646262566,
807
+ "grad_norm": 1.1320914030075073,
808
+ "learning_rate": 9.487409176868719e-06,
809
+ "loss": 0.4331,
810
+ "step": 1030
811
+ },
812
+ {
813
+ "epoch": 0.10351348661291927,
814
+ "grad_norm": 0.6469867825508118,
815
+ "learning_rate": 9.482432566935405e-06,
816
+ "loss": 0.3759,
817
+ "step": 1040
818
+ },
819
+ {
820
+ "epoch": 0.10450880859958196,
821
+ "grad_norm": 0.9471383094787598,
822
+ "learning_rate": 9.47745595700209e-06,
823
+ "loss": 0.4041,
824
+ "step": 1050
825
+ },
826
+ {
827
+ "epoch": 0.10550413058624465,
828
+ "grad_norm": 0.5729160904884338,
829
+ "learning_rate": 9.472479347068777e-06,
830
+ "loss": 0.4871,
831
+ "step": 1060
832
+ },
833
+ {
834
+ "epoch": 0.10649945257290734,
835
+ "grad_norm": 0.642436683177948,
836
+ "learning_rate": 9.467502737135465e-06,
837
+ "loss": 0.3893,
838
+ "step": 1070
839
+ },
840
+ {
841
+ "epoch": 0.10749477455957002,
842
+ "grad_norm": 0.95659339427948,
843
+ "learning_rate": 9.46252612720215e-06,
844
+ "loss": 0.4486,
845
+ "step": 1080
846
+ },
847
+ {
848
+ "epoch": 0.1084900965462327,
849
+ "grad_norm": 0.6642667055130005,
850
+ "learning_rate": 9.457549517268838e-06,
851
+ "loss": 0.5168,
852
+ "step": 1090
853
+ },
854
+ {
855
+ "epoch": 0.1094854185328954,
856
+ "grad_norm": 0.5805796980857849,
857
+ "learning_rate": 9.452572907335525e-06,
858
+ "loss": 0.4019,
859
+ "step": 1100
860
+ },
861
+ {
862
+ "epoch": 0.1094854185328954,
863
+ "eval_loss": 0.4559178054332733,
864
+ "eval_runtime": 147.5891,
865
+ "eval_samples_per_second": 1.375,
866
+ "eval_steps_per_second": 0.691,
867
+ "step": 1100
868
+ },
869
+ {
870
+ "epoch": 0.11048074051955807,
871
+ "grad_norm": 0.7006909251213074,
872
+ "learning_rate": 9.44759629740221e-06,
873
+ "loss": 0.457,
874
+ "step": 1110
875
+ },
876
+ {
877
+ "epoch": 0.11147606250622076,
878
+ "grad_norm": 1.1821540594100952,
879
+ "learning_rate": 9.442619687468896e-06,
880
+ "loss": 0.3484,
881
+ "step": 1120
882
+ },
883
+ {
884
+ "epoch": 0.11247138449288345,
885
+ "grad_norm": 0.7232743501663208,
886
+ "learning_rate": 9.437643077535584e-06,
887
+ "loss": 0.417,
888
+ "step": 1130
889
+ },
890
+ {
891
+ "epoch": 0.11346670647954614,
892
+ "grad_norm": 0.6104183197021484,
893
+ "learning_rate": 9.43266646760227e-06,
894
+ "loss": 0.4821,
895
+ "step": 1140
896
+ },
897
+ {
898
+ "epoch": 0.11446202846620881,
899
+ "grad_norm": 0.5961386561393738,
900
+ "learning_rate": 9.427689857668956e-06,
901
+ "loss": 0.4834,
902
+ "step": 1150
903
+ },
904
+ {
905
+ "epoch": 0.1154573504528715,
906
+ "grad_norm": 0.5530894994735718,
907
+ "learning_rate": 9.422713247735644e-06,
908
+ "loss": 0.443,
909
+ "step": 1160
910
+ },
911
+ {
912
+ "epoch": 0.1164526724395342,
913
+ "grad_norm": 0.5148622393608093,
914
+ "learning_rate": 9.41773663780233e-06,
915
+ "loss": 0.4029,
916
+ "step": 1170
917
+ },
918
+ {
919
+ "epoch": 0.11744799442619687,
920
+ "grad_norm": 0.6148583292961121,
921
+ "learning_rate": 9.412760027869017e-06,
922
+ "loss": 0.4308,
923
+ "step": 1180
924
+ },
925
+ {
926
+ "epoch": 0.11844331641285956,
927
+ "grad_norm": 0.7840449213981628,
928
+ "learning_rate": 9.407783417935702e-06,
929
+ "loss": 0.499,
930
+ "step": 1190
931
+ },
932
+ {
933
+ "epoch": 0.11943863839952225,
934
+ "grad_norm": 0.6757422089576721,
935
+ "learning_rate": 9.40280680800239e-06,
936
+ "loss": 0.4263,
937
+ "step": 1200
938
+ },
939
+ {
940
+ "epoch": 0.11943863839952225,
941
+ "eval_loss": 0.4505193829536438,
942
+ "eval_runtime": 147.6664,
943
+ "eval_samples_per_second": 1.375,
944
+ "eval_steps_per_second": 0.691,
945
+ "step": 1200
946
+ }
947
+ ],
948
+ "logging_steps": 10,
949
+ "max_steps": 20094,
950
+ "num_input_tokens_seen": 0,
951
+ "num_train_epochs": 2,
952
+ "save_steps": 100,
953
+ "stateful_callbacks": {
954
+ "TrainerControl": {
955
+ "args": {
956
+ "should_epoch_stop": false,
957
+ "should_evaluate": false,
958
+ "should_log": false,
959
+ "should_save": true,
960
+ "should_training_stop": false
961
+ },
962
+ "attributes": {}
963
+ }
964
+ },
965
+ "total_flos": 4.756627798568276e+18,
966
+ "train_batch_size": 2,
967
+ "trial_name": null,
968
+ "trial_params": null
969
+ }
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9c7aafcd6219383d43af3afb872b499570b7bc3857059e6738e1f455694febb5
3
+ size 5432