rbattle commited on
Commit
e8b80bb
1 Parent(s): 5cd3b4e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +15 -9
README.md CHANGED
@@ -14,20 +14,22 @@ Instruction-tuned version of the fully trained Open LLama 7B v2 model. The mode
14
  - This model performs better on code compared to v1 due to the improvements made on the base model by the openlm-research team.
15
  - The instruction model is trained on an improved instruction tuning dataset compared to v1
16
 
17
- <b> NOTE </b> : The model was trained using the Alpaca prompt template
18
- <b> NOTE </b> : Fast tokenizer results in incorrect encoding, set the ```use_fast = False``` parameter, when instantiating the tokenizer
 
19
 
20
  ## License
21
- - <b>Commercially Viable</b>
 
22
 
23
  ## Datasets used for Fine-Tuning
24
 
25
- <b>Open-instruct</b>
26
 
27
- <b>Open-instruct-v1</b>
28
  - Mosaic/Dolly-HHRLHF + filtered OASST1 - cc by 3.0
29
 
30
- <b>Subset of COT SUBMIX (FROM FLAN V2) Zeroshot examples</b>
31
  - ESNLI - MIT
32
  - ECQA - CDLA 1.0 - Sharing
33
  - Strategy - MIT
@@ -35,7 +37,6 @@ Instruction-tuned version of the fully trained Open LLama 7B v2 model. The mode
35
  - gsmk8 - MIT
36
  - aqua - MIT
37
  - qasc - Apache 2.0
38
- <br>
39
  - Language Model, ([openlm-research/open_llama_v2_7b](https://huggingface.co/openlm-research/open_llama_v2_7b)) is under apache-2.0
40
  - Dataset ([VMware/open-instruct](https://huggingface.co/datasets/VMware/open-instruct)) is under cc-by-sa-3.0
41
 
@@ -46,6 +47,7 @@ Instruction-tuned version of the fully trained Open LLama 7B v2 model. The mode
46
  - Model Size: 7B parameters
47
  - Dataset: Open-instruct
48
 
 
49
  ## Use in Transformers
50
 
51
  ```
@@ -77,9 +79,10 @@ output = tokenizer.decode(output1[0])
77
  print(output)
78
 
79
  ```
80
- ### Output
81
 
82
 
 
 
83
  Sure, I can help you with that!
84
 
85
  Attention mechanisms in transformer models are typically implemented using the attention mechanism in the self-attention layer. Self-attention allows the model to focus on different parts of the input sequence when processing it. This is achieved by computing a set of attention weights, which are used to weigh the contribution of each input element to the output.
@@ -129,8 +132,11 @@ The output of the `attention_weights` function is a NumPy tensor that represents
129
  I hope this helps!</s>
130
  <hr>
131
 
 
132
  ## Finetuning details
133
  The finetuning scripts will be available in our [RAIL Github Repository](https://github.com/vmware-labs/research-and-development-artificial-intelligence-lab/tree/main/instruction-tuning)
 
 
134
  ## Evaluation
135
 
136
- <B>TODO</B>
 
14
  - This model performs better on code compared to v1 due to the improvements made on the base model by the openlm-research team.
15
  - The instruction model is trained on an improved instruction tuning dataset compared to v1
16
 
17
+ **NOTE**: The model was trained using the Alpaca prompt template
18
+ **NOTE**: Fast tokenizer results in incorrect encoding, set the ```use_fast = False``` parameter, when instantiating the tokenizer
19
+
20
 
21
  ## License
22
+ - **Commercially Viable**
23
+
24
 
25
  ## Datasets used for Fine-Tuning
26
 
27
+ **Open-instruct**
28
 
29
+ **Open-instruct-v1**
30
  - Mosaic/Dolly-HHRLHF + filtered OASST1 - cc by 3.0
31
 
32
+ **Subset of COT SUBMIX (FROM FLAN V2) Zeroshot examples**
33
  - ESNLI - MIT
34
  - ECQA - CDLA 1.0 - Sharing
35
  - Strategy - MIT
 
37
  - gsmk8 - MIT
38
  - aqua - MIT
39
  - qasc - Apache 2.0
 
40
  - Language Model, ([openlm-research/open_llama_v2_7b](https://huggingface.co/openlm-research/open_llama_v2_7b)) is under apache-2.0
41
  - Dataset ([VMware/open-instruct](https://huggingface.co/datasets/VMware/open-instruct)) is under cc-by-sa-3.0
42
 
 
47
  - Model Size: 7B parameters
48
  - Dataset: Open-instruct
49
 
50
+
51
  ## Use in Transformers
52
 
53
  ```
 
79
  print(output)
80
 
81
  ```
 
82
 
83
 
84
+ ### Output
85
+
86
  Sure, I can help you with that!
87
 
88
  Attention mechanisms in transformer models are typically implemented using the attention mechanism in the self-attention layer. Self-attention allows the model to focus on different parts of the input sequence when processing it. This is achieved by computing a set of attention weights, which are used to weigh the contribution of each input element to the output.
 
132
  I hope this helps!</s>
133
  <hr>
134
 
135
+
136
  ## Finetuning details
137
  The finetuning scripts will be available in our [RAIL Github Repository](https://github.com/vmware-labs/research-and-development-artificial-intelligence-lab/tree/main/instruction-tuning)
138
+
139
+
140
  ## Evaluation
141
 
142
+ **TODO**