Text Generation
Transformers
PyTorch
codegen
Inference Endpoints
bol20162021 commited on
Commit
4137166
·
1 Parent(s): 1da9502
Files changed (1) hide show
  1. README.md +7 -19
README.md CHANGED
@@ -1,11 +1,11 @@
1
  ---
2
  license: bsd-3-clause
3
  ---
4
- # codgen-16B-action
5
 
6
  <!-- Provide a quick summary of what the model is/does. -->
7
 
8
- codgen-16B-action is a 16 billion parameter model used for api based action generation. It is instruction tuned from [codegen-16B-mono](https://huggingface.co/Salesforce/codegen-16B-mono) on api based action generation datasets.
9
 
10
  ## Model Details
11
 
@@ -45,7 +45,7 @@ This model is intended for commercial and research use.
45
  <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
46
 
47
 
48
- codgen-16B-action should NOT be used for purpose other than API based action generation.
49
 
50
  ### Recommendations
51
 
@@ -67,8 +67,8 @@ Users should be made aware of the risks, biases, limitations, and restrictions o
67
  ```python
68
  from transformers import AutoModelForCausalLM, AutoTokenizer
69
 
70
- tokenizer = AutoTokenizer.from_pretrained("sambanovasystems/codegen-16b-action")
71
- model = AutoModelForCausalLM.from_pretrained("sambanovasystems/codegen-16b-action", device_map="auto", torch_dtype="auto")
72
  ```
73
 
74
  ### Suggested Inference Parameters
@@ -108,7 +108,7 @@ Input text: 十七岁的风是什么颜色的?
108
 
109
  <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
110
 
111
- We trained codegen-16b-action on 4 80GB A100 gpu's. We started from [codegen-16B-mono](https://huggingface.co/Salesforce/codegen-16B-mono). We finetuned it on XXX dataset.
112
  All of the code used to prepare the datasets and the scripts to run training and inference are open-sourced and freely available at [githublink here](dummy link)
113
 
114
 
@@ -129,19 +129,7 @@ All of the code used to prepare the datasets and the scripts to run training and
129
  - Learning Rate Scheduler: Fixed LR
130
  - Weight decay: 0.1
131
 
132
- **Instruction-tuned Training on Dolly 2.0 and Oasst1**
133
 
134
- - Hardware: SambaNova Reconfigurable Dataflow Unit (RDU)
135
- - Optimizer: AdamW
136
- - Grad accumulation: 1
137
- - Epochs: 3
138
- - Global Batch size: 128
139
- - Batch tokens: 128 * 2048 = 262,144 tokens
140
- - Learning Rate: 1e-5
141
- - Learning Rate Scheduler: Cosine Schedule with Warmup
142
- - Warmup Steps: 0
143
- - End Learning Ratio: 0.1
144
- - Weight decay: 0.1
145
 
146
  </details>
147
 
@@ -150,7 +138,7 @@ All of the code used to prepare the datasets and the scripts to run training and
150
  ## Acknowledgment
151
 
152
 
153
- ## Cite codegen-16b-action
154
  ```
155
  @software{bloomchat,
156
  title = {{BLOOMChat: a New Open Multilingual Chat LLM}},
 
1
  ---
2
  license: bsd-3-clause
3
  ---
4
+ # codgen-16B-mono-toolbench
5
 
6
  <!-- Provide a quick summary of what the model is/does. -->
7
 
8
+ codgen-16B-mono-toolbench is a 16 billion parameter model used for api based action generation. It is instruction tuned from [codegen-16B-mono](https://huggingface.co/Salesforce/codegen-16B-mono) on api based action generation datasets.
9
 
10
  ## Model Details
11
 
 
45
  <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
46
 
47
 
48
+ codgen-16B-mono-toolbench should NOT be used for purpose other than API based action generation.
49
 
50
  ### Recommendations
51
 
 
67
  ```python
68
  from transformers import AutoModelForCausalLM, AutoTokenizer
69
 
70
+ tokenizer = AutoTokenizer.from_pretrained("sambanovasystems/codegen-16b-mono-toolbench")
71
+ model = AutoModelForCausalLM.from_pretrained("sambanovasystems/codegen-16b-mono-toolbench", device_map="auto", torch_dtype="auto")
72
  ```
73
 
74
  ### Suggested Inference Parameters
 
108
 
109
  <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
110
 
111
+ We trained codegen-16b-mono-toolbench on 4 80GB A100 gpu's. We started from [codegen-16B-mono](https://huggingface.co/Salesforce/codegen-16B-mono). We finetuned it on XXX dataset.
112
  All of the code used to prepare the datasets and the scripts to run training and inference are open-sourced and freely available at [githublink here](dummy link)
113
 
114
 
 
129
  - Learning Rate Scheduler: Fixed LR
130
  - Weight decay: 0.1
131
 
 
132
 
 
 
 
 
 
 
 
 
 
 
 
133
 
134
  </details>
135
 
 
138
  ## Acknowledgment
139
 
140
 
141
+ ## Cite codegen-16b-mono-toolbench
142
  ```
143
  @software{bloomchat,
144
  title = {{BLOOMChat: a New Open Multilingual Chat LLM}},