JeremyHibiki commited on
Commit
3960134
Β·
verified Β·
1 Parent(s): 4a1f7fc

Create README.md

Browse files

Based on `aapot/bge-m3-onnx` and `philipchung/bge-m3-onnx`

## Deploy with tritonserver

- Folder structure

```
β”œβ”€β”€ model_repository
β”‚ └── bge-m3
β”‚ β”œβ”€β”€ 1
β”‚ β”‚ β”œβ”€β”€ model.onnx
β”‚ β”‚ └── model.onnx.data
β”‚ └── config.pbtxt
```

- `config.pbtxt` file

```
name: "bge-m3"
backend: "onnxruntime"
max_batch_size : 4

input [
{
name: "input_ids"
data_type: TYPE_INT64
dims: [ -1 ]
},
{
name: "attention_mask"
data_type: TYPE_INT64
dims: [ -1 ]
}
]

output [
{
name: "dense_vecs"
data_type: TYPE_FP32
dims: [ 1024 ]
},
{
name: "sparse_vecs"
data_type: TYPE_FP32
dims: [ -1, 1 ]
},
{
name: "colbert_vecs"
data_type: TYPE_FP32
dims: [ -1, 1024 ]
}
]
```

- Run with tritonserver docker image

```bash
docker run --gpus all --rm -p 8000:8000 -p 8001:8001 -p 8002:8002 -v ./model_repository:/models nvcr.nju.edu.cn/nvidia/tritonserver:24.12-py3 tritonserver --
model-repository=/models
```

- Infer with `tritonsclient`

```python
from typing import List
from tritonclient.http import InferenceServerClient, InferInput
from datasets import load_dataset
from transformers import AutoTokenizer

BS = 4
TOKENIZER_NAME = "BAAI/bge-m3"
TRITON_MODEL_NAME = "bge-m3"

tokenizer = AutoTokenizer.from_pretrained(TOKENIZER_NAME)
data: List[str] = [x["text"] for x in load_dataset("BeiR/scidocs", "corpus")["corpus"]]
batch = data[:BS]

client = InferenceServerClient("localhost:8000", concurrency=1)

tokenized = tokenizer(batch, padding=True, truncation=True, return_tensors="np")
input_ids, attention_mask = tokenized.input_ids, tokenized.attention_mask

inputs = [
InferInput("input_ids", [len(batch), len(input_ids[0])], "INT64"),
InferInput("attention_mask", [len(batch), len(attention_mask[0])], "INT64"),
]
inputs[0].set_data_from_numpy(input_ids)
inputs[1].set_data_from_numpy(attention_mask)

results = client.infer(TRITON_MODEL_NAME, inputs)

dense_vecs = results.as_numpy("dense_vecs")
sparse_vecs = results.as_numpy("sparse_vecs").squeeze(-1)
colbert_vecs = results.as_numpy("colbert_vecs")

output = {
"dense_vecs": dense_vecs.tolist(),
"sparse_vecs": sparse_vecs.tolist(),
"colbert_vecs": colbert_vecs.tolist(),
}
print(output)
```

Files changed (1) hide show
  1. README.md +8 -0
README.md ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model:
3
+ - BAAI/bge-m3
4
+ pipeline_tag: feature-extraction
5
+ tags:
6
+ - bge-m3
7
+ - onnx
8
+ ---