Based on aapot/bge-m3-onnx
and philipchung/bge-m3-onnx
All three vectors (dense, sparse and colbert) are supported.
Deploy with tritonserver
- Folder structure
.
βββ model_repository
β βββ bge-m3
β βββ 1
β β βββ model.onnx
β β βββ model.onnx.data
β βββ config.pbtxt
config.pbtxt
file
name: "bge-m3"
backend: "onnxruntime"
max_batch_size : 4
input [
{
name: "input_ids"
data_type: TYPE_INT64
dims: [ -1 ]
},
{
name: "attention_mask"
data_type: TYPE_INT64
dims: [ -1 ]
}
]
output [
{
name: "dense_vecs"
data_type: TYPE_FP32
dims: [ 1024 ]
},
{
name: "sparse_vecs"
data_type: TYPE_FP32
dims: [ -1, 1 ]
},
{
name: "colbert_vecs"
data_type: TYPE_FP32
dims: [ -1, 1024 ]
}
]
- Run with tritonserver docker image
docker run --gpus all --rm -p 8000:8000 -p 8001:8001 -p 8002:8002 -v ./model_repository:/models nvcr.io/nvidia/tritonserver:24.12-py3 tritonserver --
model-repository=/models
- Infer with
tritonsclient
from typing import List
from tritonclient.http import InferenceServerClient, InferInput
from datasets import load_dataset
from transformers import AutoTokenizer
BS = 4
TOKENIZER_NAME = "BAAI/bge-m3"
TRITON_MODEL_NAME = "bge-m3"
tokenizer = AutoTokenizer.from_pretrained(TOKENIZER_NAME)
data: List[str] = [x["text"] for x in load_dataset("BeiR/scidocs", "corpus")["corpus"]]
batch = data[:BS]
client = InferenceServerClient("localhost:8000")
tokenized = tokenizer(batch, padding=True, truncation=True, return_tensors="np")
input_ids, attention_mask = tokenized.input_ids, tokenized.attention_mask
inputs = [
InferInput("input_ids", [len(batch), len(input_ids[0])], "INT64"),
InferInput("attention_mask", [len(batch), len(attention_mask[0])], "INT64"),
]
inputs[0].set_data_from_numpy(input_ids)
inputs[1].set_data_from_numpy(attention_mask)
results = client.infer(TRITON_MODEL_NAME, inputs)
dense_vecs = results.as_numpy("dense_vecs")
sparse_vecs = results.as_numpy("sparse_vecs").squeeze(-1)
colbert_vecs = results.as_numpy("colbert_vecs").squeeze(-1)
output = {
"dense_vecs": dense_vecs.tolist(),
"sparse_vecs": sparse_vecs.tolist(),
"colbert_vecs": colbert_vecs.tolist(),
}
print(output)
- Downloads last month
- 10
Model tree for JeremyHibiki/bge-m3-onnx-o4
Base model
BAAI/bge-m3