EmbeddedLLM/Phi-3-mini-4k-instruct-062024 ONNX

Model Summary

This model is an ONNX-optimized version of microsoft/Phi-3-mini-4k-instruct (June 2024), designed to provide accelerated inference on a variety of hardware using ONNX Runtime(CPU and DirectML). DirectML is a high-performance, hardware-accelerated DirectX 12 library for machine learning, providing GPU acceleration for a wide range of supported hardware and drivers, including AMD, Intel, NVIDIA, and Qualcomm GPUs.

ONNX Models

Here are some of the optimized configurations we have added:

  • ONNX model for int4 DirectML: ONNX model for AMD, Intel, and NVIDIA GPUs on Windows, quantized to int4 using AWQ.

Usage

Installation and Setup

To use the EmbeddedLLM/Phi-3-mini-4k-instruct-062024 ONNX model on Windows with DirectML, follow these steps:

  1. Create and activate a Conda environment:
conda create -n onnx python=3.10
conda activate onnx
  1. Install Git LFS:
winget install -e --id GitHub.GitLFS
  1. Install Hugging Face CLI:
pip install huggingface-hub[cli]
  1. Download the model:
huggingface-cli download EmbeddedLLM/Phi-3-mini-4k-instruct-062024-onnx --include="onnx/directml/Phi-3-mini-4k-instruct-062024-int4/*" --local-dir .\Phi-3-mini-4k-instruct-062024-int4
  1. Install necessary Python packages:
pip install numpy==1.26.4
pip install onnxruntime-directml
pip install --pre onnxruntime-genai-directml==0.3.0
  1. Install Visual Studio 2015 runtime:
conda install conda-forge::vs2015_runtime
  1. Download the example script:
Invoke-WebRequest -Uri "https://raw.githubusercontent.com/microsoft/onnxruntime-genai/main/examples/python/phi3-qa.py" -OutFile "phi3-qa.py"
  1. Run the example script:
python phi3-qa.py -m .\Phi-3-mini-4k-instruct-062024-int4

Hardware Requirements

Minimum Configuration:

  • Windows: DirectX 12-capable GPU (AMD/Nvidia)
  • CPU: x86_64 / ARM64 Tested Configurations:
  • GPU: AMD Ryzen 8000 Series iGPU (DirectML)
  • CPU: AMD Ryzen CPU

Model Description

  • Developed by: Microsoft
  • Model type: ONNX
  • Language(s) (NLP): Python, C, C++
  • License: Apache License Version 2.0
  • Model Description: This model is a conversion of the Phi-3-mini-4k-instruct-062024 for ONNX Runtime inference, optimized for DirectML.
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model authors have turned it off explicitly.

Collection including EmbeddedLLM/Phi-3-mini-4k-instruct-062024-onnx