Phi-3.5-mini-instruct-onnx model for DirectML.

#4
by Charlie-In-TW - opened

Hi,

Will you release the Phi-3.5-mini-instruct-onnx model for DirectML (Like Phi-3-mini-4k-instruct-onnx)? Or any suggestion how to convert Phi-3.5-mini-instruct-onnx to DirectML version?

Hi, I found your previous discussion about this:

--
With the newly uploaded INT4 AWQ models, there is now one optimized ONNX model for CPU and one optimized ONNX model for GPU (e.g. CUDA, DirectML). Here is a tutorial you can follow to create your own INT4 AWQ ONNX models.

For INT8 precision, you can create the FP32 ONNX model using ONNX Runtime GenAI's model builder and then use ONNX Runtime's INT8 quantization tools.

Sign up or log in to comment