File size: 1,396 Bytes
bf68955 8073bd9 bf68955 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 |
---
license: mit
---
# **Phi-3.5 Instruct OpenVINO INT4 Model**
<b><span style="text-decoration:underline">Note: This is unoffical version,just for test and dev.</span></b>
This is the OpenVINO format INT 4 quantized version of the Microsoft Phi-3.5 Instruct. You can use it with the Intel OpenVINO SDK.
```bash
optimum-cli export openvino --model "microsoft/Phi-3.5-mini-instruct" --task text-generation-with-past --weight-format int4 --group-size 128 --ratio 0.6 --sym --trust-remote-code ./model/phi3.5-instruct/int4
```
## **Sample Code**
```python
from transformers import AutoConfig, AutoTokenizer
from optimum.intel.openvino import OVModelForCausalLM
model_dir = 'Your Phi-3.5 OpenVINO Path'
ov_config = {"PERFORMANCE_HINT": "LATENCY", "NUM_STREAMS": "1", "CACHE_DIR": ""}
ov_model = OVModelForCausalLM.from_pretrained(
model_dir,
device='GPU',
ov_config=ov_config,
config=AutoConfig.from_pretrained(model_dir, trust_remote_code=True),
trust_remote_code=True,
)
tok = AutoTokenizer.from_pretrained(model_dir, trust_remote_code=True)
tokenizer_kwargs = {"add_special_tokens": False}
prompt = "<|user|>\nCan you introduce OpenVINO?\n<|end|><|assistant|>\n"
input_tokens = tok(prompt, return_tensors="pt", **tokenizer_kwargs)
answer = ov_model.generate(**input_tokens, max_new_tokens=1024)
tok.batch_decode(answer, skip_special_tokens=True)[0]
```
|