lokinfey
/

Phi-3.5-mini-instruct-ov-int4

Model card Files Files and versions Community

lokinfey commited on Aug 27, 2024

Commit

bf68955

·

verified ·

1 Parent(s): 4845478

Update README.md

Files changed (1) hide show

README.md +47 -3

README.md CHANGED Viewed

@@ -1,3 +1,47 @@
----
-license: mit
----

+---
+license: mit
+---
+# **Phi-3.5 Instruct OpenVINO INT4 Model**
+This is the OpenVINO format INT 4 quantized version of the Microsoft Phi-3.5 Instruct. You can use it with the Intel OpenVINO SDK.
+```bash
+optimum-cli export openvino --model "microsoft/Phi-3.5-mini-instruct" --task text-generation-with-past --weight-format int4 --group-size 128 --ratio 0.6  --sym  --trust-remote-code ./model/phi3.5-instruct/int4
+```
+## **Sample Code**
+```python
+from transformers import AutoConfig, AutoTokenizer
+from optimum.intel.openvino import OVModelForCausalLM
+model_dir = 'Your Phi-3.5 OpenVINO Path'
+ov_config = {"PERFORMANCE_HINT": "LATENCY", "NUM_STREAMS": "1", "CACHE_DIR": ""}
+ov_model = OVModelForCausalLM.from_pretrained(
+    model_dir,
+    device='GPU',
+    ov_config=ov_config,
+    config=AutoConfig.from_pretrained(model_dir, trust_remote_code=True),
+    trust_remote_code=True,
+)
+tok = AutoTokenizer.from_pretrained(model_dir, trust_remote_code=True)
+tokenizer_kwargs =  {"add_special_tokens": False}
+prompt = "<|user|>\nCan you introduce OpenVINO?\n<|end|><|assistant|>\n"
+input_tokens = tok(prompt, return_tensors="pt", **tokenizer_kwargs)
+answer = ov_model.generate(**input_tokens, max_new_tokens=1024)
+tok.batch_decode(answer, skip_special_tokens=True)[0]
+```