Is there any inference server which can support Phi-3-vision-128K-instruct?

#49

by farzanehnakhaee70 - opened Jul 3, 2024

Discussion

farzanehnakhaee70

Jul 3, 2024

Is there any inference server like Ollama or TGI which can support this model?

2U1

Jul 4, 2024

•

edited Jul 4, 2024

Maybe sglang can serve it. It supprots llava-next, so I think a little bit of modification can serve phi3-vision.
Oh, also vllm now supportes phi3-vision too. You can see the issue here. https://github.com/vllm-project/vllm/pull/4986
You should install vllm from source.

EricB

Jul 13, 2024

@farzanehnakhaee70 we have support in mistral.rs with multi batch, in situ quantization, and Python, OpenAI, and other APIs: https://github.com/EricLBuehler/mistral.rs/blob/master/docs%2FPHI3V.md

2U1

Jul 14, 2024

@EricB Thanks for the implementation, however it seem to a bit slow even with using isq. It takes the same amount of time when using just transformers library. Is ther something I'm missing?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment