Is there any inference server which can support Phi-3-vision-128K-instruct?

#49
by farzanehnakhaee70 - opened

Is there any inference server like Ollama or TGI which can support this model?

Maybe sglang can serve it. It supprots llava-next, so I think a little bit of modification can serve phi3-vision.
Oh, also vllm now supportes phi3-vision too. You can see the issue here. https://github.com/vllm-project/vllm/pull/4986
You should install vllm from source.

@farzanehnakhaee70 we have support in mistral.rs with multi batch, in situ quantization, and Python, OpenAI, and other APIs: https://github.com/EricLBuehler/mistral.rs/blob/master/docs%2FPHI3V.md

@EricB Thanks for the implementation, however it seem to a bit slow even with using isq. It takes the same amount of time when using just transformers library. Is ther something I'm missing?

Sign up or log in to comment