lokinfey commited on
Commit
bf68955
·
verified ·
1 Parent(s): 4845478

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +47 -3
README.md CHANGED
@@ -1,3 +1,47 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ ---
4
+
5
+ # **Phi-3.5 Instruct OpenVINO INT4 Model**
6
+
7
+ This is the OpenVINO format INT 4 quantized version of the Microsoft Phi-3.5 Instruct. You can use it with the Intel OpenVINO SDK.
8
+
9
+ ```bash
10
+
11
+ optimum-cli export openvino --model "microsoft/Phi-3.5-mini-instruct" --task text-generation-with-past --weight-format int4 --group-size 128 --ratio 0.6 --sym --trust-remote-code ./model/phi3.5-instruct/int4
12
+
13
+ ```
14
+
15
+ ## **Sample Code**
16
+
17
+
18
+ ```python
19
+
20
+ from transformers import AutoConfig, AutoTokenizer
21
+ from optimum.intel.openvino import OVModelForCausalLM
22
+
23
+ model_dir = 'Your Phi-3.5 OpenVINO Path'
24
+
25
+ ov_config = {"PERFORMANCE_HINT": "LATENCY", "NUM_STREAMS": "1", "CACHE_DIR": ""}
26
+
27
+ ov_model = OVModelForCausalLM.from_pretrained(
28
+ model_dir,
29
+ device='GPU',
30
+ ov_config=ov_config,
31
+ config=AutoConfig.from_pretrained(model_dir, trust_remote_code=True),
32
+ trust_remote_code=True,
33
+ )
34
+
35
+ tok = AutoTokenizer.from_pretrained(model_dir, trust_remote_code=True)
36
+
37
+ tokenizer_kwargs = {"add_special_tokens": False}
38
+
39
+ prompt = "<|user|>\nCan you introduce OpenVINO?\n<|end|><|assistant|>\n"
40
+
41
+ input_tokens = tok(prompt, return_tensors="pt", **tokenizer_kwargs)
42
+
43
+ answer = ov_model.generate(**input_tokens, max_new_tokens=1024)
44
+
45
+ tok.batch_decode(answer, skip_special_tokens=True)[0]
46
+
47
+ ```