Text Generation
Transformers
ONNX
English
gpt_neox
vriveras commited on
Commit
2ba3811
·
1 Parent(s): c1e9a28

Initial dolly-v2-7b Olive Optimized

Browse files
README.md CHANGED
@@ -1,3 +1,176 @@
1
  ---
2
  license: mit
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
+ language:
4
+ - en
5
+ library_name: transformers
6
+ inference: false
7
+ datasets:
8
+ - databricks/databricks-dolly-15k
9
  ---
10
+ # dolly-v2-7b Model Card
11
+ ## Summary
12
+
13
+ Databricks’ `dolly-v2-7b`, an instruction-following large language model trained on the Databricks machine learning platform
14
+ that is licensed for commercial use. Based on `pythia-6.9b`, Dolly is trained on ~15k instruction/response fine tuning records
15
+ [`databricks-dolly-15k`](https://github.com/databrickslabs/dolly/tree/master/data) generated
16
+ by Databricks employees in capability domains from the InstructGPT paper, including brainstorming, classification, closed QA, generation,
17
+ information extraction, open QA and summarization. `dolly-v2-7b` is not a state-of-the-art model, but does exhibit surprisingly
18
+ high quality instruction following behavior not characteristic of the foundation model on which it is based.
19
+
20
+ Dolly v2 is also available in these other models sizes:
21
+
22
+ * [dolly-v2-12b](https://huggingface.co/databricks/dolly-v2-12b), a 12 billion parameter based on `pythia-12b`
23
+ * [dolly-v2-3b](https://huggingface.co/databricks/dolly-v2-3b), a 2.8 billion parameter based on `pythia-2.8b`
24
+
25
+ Please refer to the [dolly GitHub repo](https://github.com/databrickslabs/dolly#getting-started-with-response-generation) for tips on
26
+ running inference for various GPU configurations.
27
+
28
+ **Owner**: Databricks, Inc.
29
+
30
+ ## Model Overview
31
+ `dolly-v2-7b` is a 6.9 billion parameter causal language model created by [Databricks](https://databricks.com/) that is derived from
32
+ [EleutherAI’s](https://www.eleuther.ai/) [Pythia-6.9b](https://huggingface.co/EleutherAI/pythia-6.9b) and fine-tuned
33
+ on a [~15K record instruction corpus](https://github.com/databrickslabs/dolly/tree/master/data) generated by Databricks employees and released under a permissive license (CC-BY-SA)
34
+
35
+ ## Usage
36
+
37
+ To use the model with the `transformers` library on a machine with GPUs, first make sure you have the `transformers` and `accelerate` libraries installed.
38
+ In a Databricks notebook you could run:
39
+
40
+ ```python
41
+ %pip install "accelerate>=0.16.0,<1" "transformers[torch]>=4.28.1,<5" "torch>=1.13.1,<2"
42
+ ```
43
+
44
+ The instruction following pipeline can be loaded using the `pipeline` function as shown below. This loads a custom `InstructionTextGenerationPipeline`
45
+ found in the model repo [here](https://huggingface.co/databricks/dolly-v2-3b/blob/main/instruct_pipeline.py), which is why `trust_remote_code=True` is required.
46
+ Including `torch_dtype=torch.bfloat16` is generally recommended if this type is supported in order to reduce memory usage. It does not appear to impact output quality.
47
+ It is also fine to remove it if there is sufficient memory.
48
+
49
+ ```python
50
+ import torch
51
+ from transformers import pipeline
52
+
53
+ generate_text = pipeline(model="databricks/dolly-v2-7b", torch_dtype=torch.bfloat16, trust_remote_code=True, device_map="auto")
54
+ ```
55
+
56
+ You can then use the pipeline to answer instructions:
57
+
58
+ ```python
59
+ res = generate_text("Explain to me the difference between nuclear fission and fusion.")
60
+ print(res[0]["generated_text"])
61
+ ```
62
+
63
+ Alternatively, if you prefer to not use `trust_remote_code=True` you can download [instruct_pipeline.py](https://huggingface.co/databricks/dolly-v2-3b/blob/main/instruct_pipeline.py),
64
+ store it alongside your notebook, and construct the pipeline yourself from the loaded model and tokenizer:
65
+
66
+ ```python
67
+ import torch
68
+ from instruct_pipeline import InstructionTextGenerationPipeline
69
+ from transformers import AutoModelForCausalLM, AutoTokenizer
70
+
71
+ tokenizer = AutoTokenizer.from_pretrained("databricks/dolly-v2-7b", padding_side="left")
72
+ model = AutoModelForCausalLM.from_pretrained("databricks/dolly-v2-7b", device_map="auto", torch_dtype=torch.bfloat16)
73
+
74
+ generate_text = InstructionTextGenerationPipeline(model=model, tokenizer=tokenizer)
75
+ ```
76
+
77
+ ### LangChain Usage
78
+
79
+ To use the pipeline with LangChain, you must set `return_full_text=True`, as LangChain expects the full text to be returned
80
+ and the default for the pipeline is to only return the new text.
81
+
82
+ ```python
83
+ import torch
84
+ from transformers import pipeline
85
+
86
+ generate_text = pipeline(model="databricks/dolly-v2-7b", torch_dtype=torch.bfloat16,
87
+ trust_remote_code=True, device_map="auto", return_full_text=True)
88
+ ```
89
+
90
+ You can create a prompt that either has only an instruction or has an instruction with context:
91
+
92
+ ```python
93
+ from langchain import PromptTemplate, LLMChain
94
+ from langchain.llms import HuggingFacePipeline
95
+
96
+ # template for an instrution with no input
97
+ prompt = PromptTemplate(
98
+ input_variables=["instruction"],
99
+ template="{instruction}")
100
+
101
+ # template for an instruction with input
102
+ prompt_with_context = PromptTemplate(
103
+ input_variables=["instruction", "context"],
104
+ template="{instruction}\n\nInput:\n{context}")
105
+
106
+ hf_pipeline = HuggingFacePipeline(pipeline=generate_text)
107
+
108
+ llm_chain = LLMChain(llm=hf_pipeline, prompt=prompt)
109
+ llm_context_chain = LLMChain(llm=hf_pipeline, prompt=prompt_with_context)
110
+ ```
111
+
112
+ Example predicting using a simple instruction:
113
+
114
+ ```python
115
+ print(llm_chain.predict(instruction="Explain to me the difference between nuclear fission and fusion.").lstrip())
116
+ ```
117
+
118
+ Example predicting using an instruction with context:
119
+
120
+ ```python
121
+ context = """George Washington (February 22, 1732[b] – December 14, 1799) was an American military officer, statesman,
122
+ and Founding Father who served as the first president of the United States from 1789 to 1797."""
123
+
124
+ print(llm_context_chain.predict(instruction="When was George Washington president?", context=context).lstrip())
125
+ ```
126
+
127
+
128
+ ## Known Limitations
129
+
130
+ ### Performance Limitations
131
+ **`dolly-v2-7b` is not a state-of-the-art generative language model** and, though quantitative benchmarking is ongoing, is not designed to perform
132
+ competitively with more modern model architectures or models subject to larger pretraining corpuses.
133
+
134
+ The Dolly model family is under active development, and so any list of shortcomings is unlikely to be exhaustive, but we include known limitations and misfires here as a means to document and share our preliminary findings with the community.
135
+ In particular, `dolly-v2-7b` struggles with: syntactically complex prompts, programming problems, mathematical operations, factual errors,
136
+ dates and times, open-ended question answering, hallucination, enumerating lists of specific length, stylistic mimicry, having a sense of humor, etc.
137
+ Moreover, we find that `dolly-v2-7b` does not have some capabilities, such as well-formatted letter writing, present in the original model.
138
+
139
+ ### Dataset Limitations
140
+ Like all language models, `dolly-v2-7b` reflects the content and limitations of its training corpuses.
141
+
142
+ - **The Pile**: GPT-J’s pre-training corpus contains content mostly collected from the public internet, and like most web-scale datasets,
143
+ it contains content many users would find objectionable. As such, the model is likely to reflect these shortcomings, potentially overtly
144
+ in the case it is explicitly asked to produce objectionable content, and sometimes subtly, as in the case of biased or harmful implicit
145
+ associations.
146
+
147
+ - **`databricks-dolly-15k`**: The training data on which `dolly-v2-7b` is instruction tuned represents natural language instructions generated
148
+ by Databricks employees during a period spanning March and April 2023 and includes passages from Wikipedia as references passages
149
+ for instruction categories like closed QA and summarization. To our knowledge it does not contain obscenity, intellectual property or
150
+ personally identifying information about non-public figures, but it may contain typos and factual errors.
151
+ The dataset may also reflect biases found in Wikipedia. Finally, the dataset likely reflects
152
+ the interests and semantic choices of Databricks employees, a demographic which is not representative of the global population at large.
153
+
154
+ Databricks is committed to ongoing research and development efforts to develop helpful, honest and harmless AI technologies that
155
+ maximize the potential of all individuals and organizations.
156
+
157
+ ### Benchmark Metrics
158
+
159
+ Below you'll find various models benchmark performance on the [EleutherAI LLM Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness);
160
+ model results are sorted by geometric mean to produce an intelligible ordering. As outlined above, these results demonstrate that `dolly-v2-7b` is not state of the art,
161
+ and in fact underperforms `dolly-v1-6b` in some evaluation benchmarks. We believe this owes to the composition and size of the underlying fine tuning datasets,
162
+ but a robust statement as to the sources of these variations requires further study.
163
+
164
+ | model | openbookqa | arc_easy | winogrande | hellaswag | arc_challenge | piqa | boolq | gmean |
165
+ | --------------------------------- | ------------ | ---------- | ------------ | ----------- | --------------- | -------- | -------- | ---------|
166
+ | EleutherAI/pythia-2.8b | 0.348 | 0.585859 | 0.589582 | 0.591217 | 0.323379 | 0.73395 | 0.638226 | 0.523431 |
167
+ | EleutherAI/pythia-6.9b | 0.368 | 0.604798 | 0.608524 | 0.631548 | 0.343857 | 0.761153 | 0.6263 | 0.543567 |
168
+ | databricks/dolly-v2-3b | 0.384 | 0.611532 | 0.589582 | 0.650767 | 0.370307 | 0.742655 | 0.575535 | 0.544886 |
169
+ | EleutherAI/pythia-12b | 0.364 | 0.627104 | 0.636148 | 0.668094 | 0.346416 | 0.760065 | 0.673394 | 0.559676 |
170
+ | EleutherAI/gpt-j-6B | 0.382 | 0.621633 | 0.651144 | 0.662617 | 0.363481 | 0.761153 | 0.655963 | 0.565936 |
171
+ | databricks/dolly-v2-12b | 0.408 | 0.63931 | 0.616417 | 0.707927 | 0.388225 | 0.757889 | 0.568196 | 0.56781 |
172
+ | databricks/dolly-v2-7b | 0.392 | 0.633838 | 0.607735 | 0.686517 | 0.406997 | 0.750816 | 0.644037 | 0.573487 |
173
+ | databricks/dolly-v1-6b | 0.41 | 0.62963 | 0.643252 | 0.676758 | 0.384812 | 0.773667 | 0.687768 | 0.583431 |
174
+ | EleutherAI/gpt-neox-20b | 0.402 | 0.683923 | 0.656669 | 0.7142 | 0.408703 | 0.784004 | 0.695413 | 0.602236 |
175
+
176
+ # Happy Hacking!
_gpt_neox_layers.0_attention_rotary_emb_Constant_5_attr__value ADDED
Binary file (262 kB). View file
 
_gpt_neox_layers.0_attention_rotary_emb_Constant_attr__value ADDED
Binary file (262 kB). View file
 
config.json ADDED
@@ -0,0 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "databricks/dolly-v2-7b",
3
+ "architectures": [
4
+ "GPTNeoXForCausalLM"
5
+ ],
6
+ "bos_token_id": 0,
7
+ "classifier_dropout": 0.1,
8
+ "custom_pipelines": {
9
+ "text-generation": {
10
+ "impl": "instruct_pipeline.InstructionTextGenerationPipeline",
11
+ "pt": "AutoModelForCausalLM",
12
+ "tf": "TFAutoModelForCausalLM"
13
+ }
14
+ },
15
+ "eos_token_id": 0,
16
+ "hidden_act": "gelu",
17
+ "hidden_size": 4096,
18
+ "initializer_range": 0.02,
19
+ "intermediate_size": 16384,
20
+ "layer_norm_eps": 1e-05,
21
+ "max_position_embeddings": 2048,
22
+ "model_type": "gpt_neox",
23
+ "num_attention_heads": 32,
24
+ "num_hidden_layers": 32,
25
+ "rotary_emb_base": 10000,
26
+ "rotary_pct": 0.25,
27
+ "tie_word_embeddings": false,
28
+ "torch_dtype": "bfloat16",
29
+ "transformers_version": "4.29.0",
30
+ "use_cache": true,
31
+ "use_parallel_residual": true,
32
+ "vocab_size": 50280
33
+ }
decoder_model_merged.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0ea722b6e7020eae65c844c168d5a97279d3c0d00fccce5bdcfa65688f6e96d6
3
+ size 4169900
decoder_model_merged.onnx_data ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5b6b1e949e90e91c080f89c3aed058c04b8eb842684d9c8502b0b455ac36ef8c
3
+ size 13716054016
generation_config.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 0,
4
+ "eos_token_id": 0,
5
+ "transformers_version": "4.29.0"
6
+ }
instruct_pipeline.py ADDED
@@ -0,0 +1,208 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import logging
2
+ import re
3
+ from typing import List
4
+
5
+ import numpy as np
6
+ from transformers import Pipeline, PreTrainedTokenizer
7
+
8
+ from transformers.utils import is_tf_available
9
+ from transformers import TextStreamer
10
+
11
+ if is_tf_available():
12
+ import tensorflow as tf
13
+
14
+ logger = logging.getLogger(__name__)
15
+
16
+ INSTRUCTION_KEY = "### Instruction:"
17
+ RESPONSE_KEY = "### Response:"
18
+ END_KEY = "### End"
19
+ INTRO_BLURB = (
20
+ "Below is an instruction that describes a task. Write a response that appropriately completes the request."
21
+ )
22
+
23
+ # This is the prompt that is used for generating responses using an already trained model. It ends with the response
24
+ # key, where the job of the model is to provide the completion that follows it (i.e. the response itself).
25
+ PROMPT_FOR_GENERATION_FORMAT = """{intro}
26
+ {instruction_key}
27
+ {instruction}
28
+ {response_key}
29
+ """.format(
30
+ intro=INTRO_BLURB,
31
+ instruction_key=INSTRUCTION_KEY,
32
+ instruction="{instruction}",
33
+ response_key=RESPONSE_KEY,
34
+ )
35
+
36
+
37
+ def get_special_token_id(tokenizer: PreTrainedTokenizer, key: str) -> int:
38
+ """Gets the token ID for a given string that has been added to the tokenizer as a special token.
39
+ When training, we configure the tokenizer so that the sequences like "### Instruction:" and "### End" are
40
+ treated specially and converted to a single, new token. This retrieves the token ID each of these keys map to.
41
+ Args:
42
+ tokenizer (PreTrainedTokenizer): the tokenizer
43
+ key (str): the key to convert to a single token
44
+ Raises:
45
+ RuntimeError: if more than one ID was generated
46
+ Returns:
47
+ int: the token ID for the given key
48
+ """
49
+ token_ids = tokenizer.encode(key)
50
+ if len(token_ids) > 1:
51
+ raise ValueError(f"Expected only a single token for '{key}' but found {token_ids}")
52
+ return token_ids[0]
53
+
54
+
55
+ class InstructionTextGenerationPipeline(Pipeline):
56
+ def __init__(
57
+ self, *args, do_sample: bool = True, max_new_tokens: int = 256, streamer: TextStreamer, top_p: float = 0.92, top_k: int = 0, **kwargs
58
+ ):
59
+ """Initialize the pipeline
60
+ Args:
61
+ do_sample (bool, optional): Whether or not to use sampling. Defaults to True.
62
+ max_new_tokens (int, optional): Max new tokens after the prompt to generate. Defaults to 128.
63
+ top_p (float, optional): If set to float < 1, only the smallest set of most probable tokens with
64
+ probabilities that add up to top_p or higher are kept for generation. Defaults to 0.92.
65
+ top_k (int, optional): The number of highest probability vocabulary tokens to keep for top-k-filtering.
66
+ Defaults to 0.
67
+ """
68
+ super().__init__(*args, do_sample=do_sample, max_new_tokens=max_new_tokens, top_p=top_p, top_k=top_k,
69
+ **kwargs)
70
+ self.streamer = streamer
71
+
72
+ def _sanitize_parameters(self,
73
+ return_full_text: bool = None,
74
+ **generate_kwargs):
75
+ preprocess_params = {}
76
+
77
+ # newer versions of the tokenizer configure the response key as a special token. newer versions still may
78
+ # append a newline to yield a single token. find whatever token is configured for the response key.
79
+ tokenizer_response_key = next(
80
+ (token for token in self.tokenizer.additional_special_tokens if token.startswith(RESPONSE_KEY)), None
81
+ )
82
+
83
+ response_key_token_id = None
84
+ end_key_token_id = None
85
+ if tokenizer_response_key:
86
+ try:
87
+ response_key_token_id = get_special_token_id(self.tokenizer, tokenizer_response_key)
88
+ end_key_token_id = get_special_token_id(self.tokenizer, END_KEY)
89
+
90
+ # Ensure generation stops once it generates "### End"
91
+ generate_kwargs["eos_token_id"] = end_key_token_id
92
+ except ValueError:
93
+ pass
94
+
95
+ forward_params = generate_kwargs
96
+ postprocess_params = {
97
+ "response_key_token_id": response_key_token_id,
98
+ "end_key_token_id": end_key_token_id
99
+ }
100
+
101
+ if return_full_text is not None:
102
+ postprocess_params["return_full_text"] = return_full_text
103
+
104
+ return preprocess_params, forward_params, postprocess_params
105
+
106
+ def preprocess(self, instruction_text, **generate_kwargs):
107
+ prompt_text = PROMPT_FOR_GENERATION_FORMAT.format(instruction=instruction_text)
108
+ inputs = self.tokenizer(
109
+ prompt_text,
110
+ return_tensors="pt",
111
+ )
112
+ inputs["prompt_text"] = prompt_text
113
+ inputs["instruction_text"] = instruction_text
114
+ return inputs
115
+
116
+ def _forward(self, model_inputs, **generate_kwargs):
117
+ input_ids = model_inputs["input_ids"]
118
+ attention_mask = model_inputs.get("attention_mask", None)
119
+
120
+ if input_ids.shape[1] == 0:
121
+ input_ids = None
122
+ attention_mask = None
123
+ in_b = 1
124
+ else:
125
+ in_b = input_ids.shape[0]
126
+
127
+ generated_sequence = self.model.generate(
128
+ input_ids=input_ids.to(self.model.device),
129
+ attention_mask=attention_mask.to(self.model.device) if attention_mask is not None else None,
130
+ pad_token_id=self.tokenizer.pad_token_id,
131
+ streamer=self.streamer,
132
+ **generate_kwargs,
133
+ )
134
+
135
+ out_b = generated_sequence.shape[0]
136
+ if self.framework == "pt":
137
+ generated_sequence = generated_sequence.reshape(in_b, out_b // in_b, *generated_sequence.shape[1:])
138
+ elif self.framework == "tf":
139
+ generated_sequence = tf.reshape(generated_sequence, (in_b, out_b // in_b, *generated_sequence.shape[1:]))
140
+
141
+ instruction_text = model_inputs.pop("instruction_text")
142
+ return {"generated_sequence": generated_sequence, "input_ids": input_ids, "instruction_text": instruction_text}
143
+
144
+ def postprocess(self, model_outputs, response_key_token_id, end_key_token_id, return_full_text: bool = False):
145
+
146
+ generated_sequence = model_outputs["generated_sequence"][0]
147
+ instruction_text = model_outputs["instruction_text"]
148
+
149
+ generated_sequence: List[List[int]] = generated_sequence.numpy().tolist()
150
+ records = []
151
+ for sequence in generated_sequence:
152
+
153
+ # The response will be set to this variable if we can identify it.
154
+ decoded = None
155
+
156
+ # If we have token IDs for the response and end, then we can find the tokens and only decode between them.
157
+ if response_key_token_id and end_key_token_id:
158
+ # Find where "### Response:" is first found in the generated tokens. Considering this is part of the
159
+ # prompt, we should definitely find it. We will return the tokens found after this token.
160
+ try:
161
+ response_pos = sequence.index(response_key_token_id)
162
+ except ValueError:
163
+ logger.warn(f"Could not find response key {response_key_token_id} in: {sequence}")
164
+ response_pos = None
165
+
166
+ if response_pos:
167
+ # Next find where "### End" is located. The model has been trained to end its responses with this
168
+ # sequence (or actually, the token ID it maps to, since it is a special token). We may not find
169
+ # this token, as the response could be truncated. If we don't find it then just return everything
170
+ # to the end. Note that even though we set eos_token_id, we still see the this token at the end.
171
+ try:
172
+ end_pos = sequence.index(end_key_token_id)
173
+ except ValueError:
174
+ end_pos = None
175
+
176
+ decoded = self.tokenizer.decode(sequence[response_pos + 1 : end_pos]).strip()
177
+
178
+ if not decoded:
179
+ # Otherwise we'll decode everything and use a regex to find the response and end.
180
+
181
+ fully_decoded = self.tokenizer.decode(sequence)
182
+
183
+ # The response appears after "### Response:". The model has been trained to append "### End" at the
184
+ # end.
185
+ m = re.search(r"#+\s*Response:\s*(.+?)#+\s*End", fully_decoded, flags=re.DOTALL)
186
+
187
+ if m:
188
+ decoded = m.group(1).strip()
189
+ else:
190
+ # The model might not generate the "### End" sequence before reaching the max tokens. In this case,
191
+ # return everything after "### Response:".
192
+ m = re.search(r"#+\s*Response:\s*(.+)", fully_decoded, flags=re.DOTALL)
193
+ if m:
194
+ decoded = m.group(1).strip()
195
+ else:
196
+ logger.warn(f"Failed to find response in:\n{fully_decoded}")
197
+
198
+ # If the full text is requested, then append the decoded text to the original instruction.
199
+ # This technically isn't the full text, as we format the instruction in the prompt the model has been
200
+ # trained on, but to the client it will appear to be the full text.
201
+ if return_full_text:
202
+ decoded = f"{instruction_text}\n{decoded}"
203
+
204
+ rec = {"generated_text": decoded}
205
+
206
+ records.append(rec)
207
+
208
+ return records
special_tokens_map.json ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "### End",
4
+ "### Instruction:",
5
+ "### Response:"
6
+ ],
7
+ "bos_token": "<|endoftext|>",
8
+ "eos_token": "<|endoftext|>",
9
+ "pad_token": "<|endoftext|>",
10
+ "unk_token": "<|endoftext|>"
11
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "bos_token": "<|endoftext|>",
4
+ "clean_up_tokenization_spaces": true,
5
+ "eos_token": "<|endoftext|>",
6
+ "model_max_length": 1000000000000000019884624838656,
7
+ "tokenizer_class": "GPTNeoXTokenizer",
8
+ "unk_token": "<|endoftext|>"
9
+ }