Which embedding vector to use?
Hi,
we would like to use the model for document embedding and retrieval. We noticed that the model returns 3 different vectors though (all 1024 dimensions). What is the difference between them and which of those should we use for retrieval with cosine similarity? Or do we have to compute the feature vector in another step out of those 3?
Not sure about your problem, the code at https://huggingface.co/intfloat/e5-large-v2#usage only produces one vector for each input text.
Can you provide more details on how you compute the embeddings?
Thank you for your quick reply! What I mean is that when we deploy the model to "Inference Endpoints" or use the "Hosted inference API" on the hugging face page, it will output multiple vectors.
So it's a bit unclear to us what the output of the "Hosted Inference API" represents. For example for this input "passage: E5 is awesome", it will return 8 vectors like this:
[
[
[1024],
[1024],
[1024],
[1024],
[1024],
[1024],
[1024],
[1024]
]
]
Is this maybe one vector per token?
Do I understand it correctly that we would have to average those to get one feature vector?
This inference API is automatically set up by HuggingFace, looks like it returns the last layer hidden states.
Yes, please follow our demo code to average them into one vector and use cosine similarity for retrieval.
Is it possible to edit the Hugging Face model to do that instead of having clients do it?
I am not aware of any way to do it automatically. Prepending a string prefix should be fairly trivial to do on client side.
Same thing is happening when running it with from sagemaker.huggingface.model import HuggingFaceModel