File size: 1,396 Bytes
bf68955
 
 
 
 
 
8073bd9
 
bf68955
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
---
license: mit
---

# **Phi-3.5 Instruct OpenVINO INT4 Model**

<b><span style="text-decoration:underline">Note: This is unoffical version,just for test and dev.</span></b>

This is the OpenVINO format INT 4 quantized version of the Microsoft Phi-3.5 Instruct. You can use it with the Intel OpenVINO SDK. 

```bash

optimum-cli export openvino --model "microsoft/Phi-3.5-mini-instruct" --task text-generation-with-past --weight-format int4 --group-size 128 --ratio 0.6  --sym  --trust-remote-code ./model/phi3.5-instruct/int4

```

## **Sample Code**


```python

from transformers import AutoConfig, AutoTokenizer
from optimum.intel.openvino import OVModelForCausalLM

model_dir = 'Your Phi-3.5 OpenVINO Path'

ov_config = {"PERFORMANCE_HINT": "LATENCY", "NUM_STREAMS": "1", "CACHE_DIR": ""}

ov_model = OVModelForCausalLM.from_pretrained(
    model_dir,
    device='GPU',
    ov_config=ov_config,
    config=AutoConfig.from_pretrained(model_dir, trust_remote_code=True),
    trust_remote_code=True,
)

tok = AutoTokenizer.from_pretrained(model_dir, trust_remote_code=True)

tokenizer_kwargs =  {"add_special_tokens": False}

prompt = "<|user|>\nCan you introduce OpenVINO?\n<|end|><|assistant|>\n"

input_tokens = tok(prompt, return_tensors="pt", **tokenizer_kwargs)

answer = ov_model.generate(**input_tokens, max_new_tokens=1024)

tok.batch_decode(answer, skip_special_tokens=True)[0]

```