Performant Inference Frameworks

#6
by nazrak - opened

I can't think of a better model to ask this on that one developed by Nvidia!

Are there any more performant inference frameworks this model is compatible with (and if so, can they be added to the model card?) Specifically, is this compatible with any plug-and-play frameworks like HF's https://github.com/huggingface/text-embeddings-inference, or can it can be compiled via TensorRT-LLM?

I think you can use vllm: https://docs.vllm.ai/en/stable/getting_started/examples/offline_inference_embedding.html

never mind it uses custom code. this won't work. I thought it was a generic mistral model

You could try their NIM: https://build.nvidia.com/nvidia/nv-embed-v1

NVIDIA org

Hi, @nazrak . Thank you for asking the question. This specific model will not be supported by NIM due to non-commercial license. Instead, NIM supports the commercially available NVIDIA's embedding model in the following link: https://build.nvidia.com/explore/retrieval

nada5 changed discussion status to closed

Sign up or log in to comment