Update README.md
Browse files
README.md
CHANGED
@@ -14,20 +14,22 @@ Instruction-tuned version of the fully trained Open LLama 7B v2 model. The mode
|
|
14 |
- This model performs better on code compared to v1 due to the improvements made on the base model by the openlm-research team.
|
15 |
- The instruction model is trained on an improved instruction tuning dataset compared to v1
|
16 |
|
17 |
-
|
18 |
-
|
|
|
19 |
|
20 |
## License
|
21 |
-
-
|
|
|
22 |
|
23 |
## Datasets used for Fine-Tuning
|
24 |
|
25 |
-
|
26 |
|
27 |
-
|
28 |
- Mosaic/Dolly-HHRLHF + filtered OASST1 - cc by 3.0
|
29 |
|
30 |
-
|
31 |
- ESNLI - MIT
|
32 |
- ECQA - CDLA 1.0 - Sharing
|
33 |
- Strategy - MIT
|
@@ -35,7 +37,6 @@ Instruction-tuned version of the fully trained Open LLama 7B v2 model. The mode
|
|
35 |
- gsmk8 - MIT
|
36 |
- aqua - MIT
|
37 |
- qasc - Apache 2.0
|
38 |
-
<br>
|
39 |
- Language Model, ([openlm-research/open_llama_v2_7b](https://huggingface.co/openlm-research/open_llama_v2_7b)) is under apache-2.0
|
40 |
- Dataset ([VMware/open-instruct](https://huggingface.co/datasets/VMware/open-instruct)) is under cc-by-sa-3.0
|
41 |
|
@@ -46,6 +47,7 @@ Instruction-tuned version of the fully trained Open LLama 7B v2 model. The mode
|
|
46 |
- Model Size: 7B parameters
|
47 |
- Dataset: Open-instruct
|
48 |
|
|
|
49 |
## Use in Transformers
|
50 |
|
51 |
```
|
@@ -77,9 +79,10 @@ output = tokenizer.decode(output1[0])
|
|
77 |
print(output)
|
78 |
|
79 |
```
|
80 |
-
### Output
|
81 |
|
82 |
|
|
|
|
|
83 |
Sure, I can help you with that!
|
84 |
|
85 |
Attention mechanisms in transformer models are typically implemented using the attention mechanism in the self-attention layer. Self-attention allows the model to focus on different parts of the input sequence when processing it. This is achieved by computing a set of attention weights, which are used to weigh the contribution of each input element to the output.
|
@@ -129,8 +132,11 @@ The output of the `attention_weights` function is a NumPy tensor that represents
|
|
129 |
I hope this helps!</s>
|
130 |
<hr>
|
131 |
|
|
|
132 |
## Finetuning details
|
133 |
The finetuning scripts will be available in our [RAIL Github Repository](https://github.com/vmware-labs/research-and-development-artificial-intelligence-lab/tree/main/instruction-tuning)
|
|
|
|
|
134 |
## Evaluation
|
135 |
|
136 |
-
|
|
|
14 |
- This model performs better on code compared to v1 due to the improvements made on the base model by the openlm-research team.
|
15 |
- The instruction model is trained on an improved instruction tuning dataset compared to v1
|
16 |
|
17 |
+
**NOTE**: The model was trained using the Alpaca prompt template
|
18 |
+
**NOTE**: Fast tokenizer results in incorrect encoding, set the ```use_fast = False``` parameter, when instantiating the tokenizer
|
19 |
+
|
20 |
|
21 |
## License
|
22 |
+
- **Commercially Viable**
|
23 |
+
|
24 |
|
25 |
## Datasets used for Fine-Tuning
|
26 |
|
27 |
+
**Open-instruct**
|
28 |
|
29 |
+
**Open-instruct-v1**
|
30 |
- Mosaic/Dolly-HHRLHF + filtered OASST1 - cc by 3.0
|
31 |
|
32 |
+
**Subset of COT SUBMIX (FROM FLAN V2) Zeroshot examples**
|
33 |
- ESNLI - MIT
|
34 |
- ECQA - CDLA 1.0 - Sharing
|
35 |
- Strategy - MIT
|
|
|
37 |
- gsmk8 - MIT
|
38 |
- aqua - MIT
|
39 |
- qasc - Apache 2.0
|
|
|
40 |
- Language Model, ([openlm-research/open_llama_v2_7b](https://huggingface.co/openlm-research/open_llama_v2_7b)) is under apache-2.0
|
41 |
- Dataset ([VMware/open-instruct](https://huggingface.co/datasets/VMware/open-instruct)) is under cc-by-sa-3.0
|
42 |
|
|
|
47 |
- Model Size: 7B parameters
|
48 |
- Dataset: Open-instruct
|
49 |
|
50 |
+
|
51 |
## Use in Transformers
|
52 |
|
53 |
```
|
|
|
79 |
print(output)
|
80 |
|
81 |
```
|
|
|
82 |
|
83 |
|
84 |
+
### Output
|
85 |
+
|
86 |
Sure, I can help you with that!
|
87 |
|
88 |
Attention mechanisms in transformer models are typically implemented using the attention mechanism in the self-attention layer. Self-attention allows the model to focus on different parts of the input sequence when processing it. This is achieved by computing a set of attention weights, which are used to weigh the contribution of each input element to the output.
|
|
|
132 |
I hope this helps!</s>
|
133 |
<hr>
|
134 |
|
135 |
+
|
136 |
## Finetuning details
|
137 |
The finetuning scripts will be available in our [RAIL Github Repository](https://github.com/vmware-labs/research-and-development-artificial-intelligence-lab/tree/main/instruction-tuning)
|
138 |
+
|
139 |
+
|
140 |
## Evaluation
|
141 |
|
142 |
+
**TODO**
|