update context length
Browse files
README.md
CHANGED
@@ -1,6 +1,6 @@
|
|
1 |
---
|
2 |
pipeline_tag: text-generation
|
3 |
-
base_model: ibm-granite/granite-8b-code-base
|
4 |
inference: false
|
5 |
license: apache-2.0
|
6 |
datasets:
|
@@ -19,7 +19,7 @@ tags:
|
|
19 |
- code
|
20 |
- granite
|
21 |
model-index:
|
22 |
-
- name: granite-8b-code-instruct
|
23 |
results:
|
24 |
- task:
|
25 |
type: text-generation
|
@@ -205,10 +205,10 @@ model-index:
|
|
205 |
|
206 |
![image/png](https://cdn-uploads.huggingface.co/production/uploads/62cd5057674cdb524450093d/1hzxoPwqkBJXshKVVe6_9.png)
|
207 |
|
208 |
-
# Granite-8B-Code-Instruct
|
209 |
|
210 |
## Model Summary
|
211 |
-
**Granite-8B-Code-Instruct** is a 8B parameter model fine tuned from *Granite-8B-Code-Base* on a combination of **permissively licensed** instruction data to enhance instruction following capabilities including logical reasoning and problem-solving skills.
|
212 |
|
213 |
- **Developers:** IBM Research
|
214 |
- **GitHub Repository:** [ibm-granite/granite-code-models](https://github.com/ibm-granite/granite-code-models)
|
@@ -223,13 +223,13 @@ The model is designed to respond to coding related instructions and can be used
|
|
223 |
<!-- TO DO: Check starcoder2 instruct code example that includes the template https://huggingface.co/bigcode/starcoder2-15b-instruct-v0.1 -->
|
224 |
|
225 |
### Generation
|
226 |
-
This is a simple example of how to use **Granite-8B-Code-Instruct** model.
|
227 |
|
228 |
```python
|
229 |
import torch
|
230 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
231 |
device = "cuda" # or "cpu"
|
232 |
-
model_path = "ibm-granite/granite-8b-code-instruct"
|
233 |
tokenizer = AutoTokenizer.from_pretrained(model_path)
|
234 |
# drop device_map if running on CPU
|
235 |
model = AutoModelForCausalLM.from_pretrained(model_path, device_map=device)
|
@@ -265,4 +265,4 @@ Granite Code Instruct models are trained on the following types of data.
|
|
265 |
We train the Granite Code models using two of IBM's super computing clusters, namely Vela and Blue Vela, both outfitted with NVIDIA A100 and H100 GPUs respectively. These clusters provide a scalable and efficient infrastructure for training our models over thousands of GPUs.
|
266 |
|
267 |
## Ethical Considerations and Limitations
|
268 |
-
Granite code instruct models are primarily finetuned using instruction-response pairs across a specific set of programming languages. Thus, their performance may be limited with out-of-domain programming languages. In this situation, it is beneficial providing few-shot examples to steer the model's output. Moreover, developers should perform safety testing and target-specific tuning before deploying these models on critical applications. The model also inherits ethical considerations and limitations from its base model. For more information, please refer to *[Granite-8B-Code-Base](https://huggingface.co/ibm-granite/granite-8b-code-base)* model card.
|
|
|
1 |
---
|
2 |
pipeline_tag: text-generation
|
3 |
+
base_model: ibm-granite/granite-8b-code-base-4k
|
4 |
inference: false
|
5 |
license: apache-2.0
|
6 |
datasets:
|
|
|
19 |
- code
|
20 |
- granite
|
21 |
model-index:
|
22 |
+
- name: granite-8b-code-instruct-4k
|
23 |
results:
|
24 |
- task:
|
25 |
type: text-generation
|
|
|
205 |
|
206 |
![image/png](https://cdn-uploads.huggingface.co/production/uploads/62cd5057674cdb524450093d/1hzxoPwqkBJXshKVVe6_9.png)
|
207 |
|
208 |
+
# Granite-8B-Code-Instruct-4K
|
209 |
|
210 |
## Model Summary
|
211 |
+
**Granite-8B-Code-Instruct-4K** is a 8B parameter model fine tuned from *Granite-8B-Code-Base-4K* on a combination of **permissively licensed** instruction data to enhance instruction following capabilities including logical reasoning and problem-solving skills.
|
212 |
|
213 |
- **Developers:** IBM Research
|
214 |
- **GitHub Repository:** [ibm-granite/granite-code-models](https://github.com/ibm-granite/granite-code-models)
|
|
|
223 |
<!-- TO DO: Check starcoder2 instruct code example that includes the template https://huggingface.co/bigcode/starcoder2-15b-instruct-v0.1 -->
|
224 |
|
225 |
### Generation
|
226 |
+
This is a simple example of how to use **Granite-8B-Code-Instruct-4K** model.
|
227 |
|
228 |
```python
|
229 |
import torch
|
230 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
231 |
device = "cuda" # or "cpu"
|
232 |
+
model_path = "ibm-granite/granite-8b-code-instruct-4k"
|
233 |
tokenizer = AutoTokenizer.from_pretrained(model_path)
|
234 |
# drop device_map if running on CPU
|
235 |
model = AutoModelForCausalLM.from_pretrained(model_path, device_map=device)
|
|
|
265 |
We train the Granite Code models using two of IBM's super computing clusters, namely Vela and Blue Vela, both outfitted with NVIDIA A100 and H100 GPUs respectively. These clusters provide a scalable and efficient infrastructure for training our models over thousands of GPUs.
|
266 |
|
267 |
## Ethical Considerations and Limitations
|
268 |
+
Granite code instruct models are primarily finetuned using instruction-response pairs across a specific set of programming languages. Thus, their performance may be limited with out-of-domain programming languages. In this situation, it is beneficial providing few-shot examples to steer the model's output. Moreover, developers should perform safety testing and target-specific tuning before deploying these models on critical applications. The model also inherits ethical considerations and limitations from its base model. For more information, please refer to *[Granite-8B-Code-Base-4K](https://huggingface.co/ibm-granite/granite-8b-code-base-4k)* model card.
|