Alibaba-NLP/gte-large-en-v1.5 · Different behavior between SentenceTransformer and TEI/Infinity when using gte-large-en-v1.5

System Info

Reproduction

# using TEI
# https://huggingface.co/docs/text-embeddings-inference/index

model=Alibaba-NLP/gte-large-en-v1.5
text-embeddings-router --model-id $model --port 8080
curl -X POST "http://localhost:8080/embeddings" \
     -H "Content-Type: application/json" \
     -d '{"input":["Dimension table for main account?"]}'

[
  -0.0006371783,
  -0.03931647,
  -0.010235489,
  -0.019322978,
  -0.014273809,
  0.022573953
]

# using infinity_emb
# https://github.com/michaelfeil/infinity

infinity_emb v2 --model-id Alibaba-NLP/gte-large-en-v1.5
curl -X POST   http://localhost:7997/embeddings   -H 'Content-Type: application/json'  \
-d '{"input": ["Dimension table for main account?"]}' \
| jq '.data[0].embedding | .[:6]'

[
  -0.000593528151512146,
  -0.039367105811834335,
  -0.010303903371095657,
  -0.01923666149377823,
  -0.014310694299638271,
  0.02248678356409073
]

# using SentenceTransformer
from sentence_transformers import SentenceTransformer

model = SentenceTransformer("Alibaba-NLP/gte-large-en-v1.5",trust_remote_code=True)
embeddings = model.encode(['Dimension table for main account?'])
print(list(embeddings[0][:6]))

[-0.015188057, -0.9458093, -0.24485634, -0.4617836, -0.3435278, 0.53972]

When using SentenceTransformer, it will download a new model named Alibaba-NLP/new-impl, but TEI/infinity_emb may use the original model.

/home/smilencer/miniconda3/envs/ml/lib/python3.12/site-packages/huggingface_hub/file_download.py:1150: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
configuration.py: 7.13kB [00:00, 25.2MB/s]
A new version of the following files was downloaded from https://huggingface.co/Alibaba-NLP/new-impl:
- configuration.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
modeling.py: 59.0kB [00:00, 350kB/s]
A new version of the following files was downloaded from https://huggingface.co/Alibaba-NLP/new-impl:
- modeling.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.

Is there anyway to make TEI/infinity_emb to use Alibaba-NLP/new-impl?
I tried to modify the repo files ref https://huggingface.co/Alibaba-NLP/new-impl/discussions/2, but it's not working.

Expected behavior

the embedding results are the same