ONNX GPU Runtime with O4 for BAAI/bge-reranker-large

benchmark: https://colab.research.google.com/drive/1HP9GQKdzYa6H9SJnAZoxJWq920gxwd2k

Convert

!optimum-cli export onnx -m BAAI/bge-reranker-large --optimize O4 bge-reranker-large-onnx-o4 --device cuda

Usage

# pip install "optimum[onnxruntime-gpu]" transformers

from optimum.onnxruntime import ORTModelForSequenceClassification
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained('swulling/bge-reranker-large-onnx-o4')
model = ORTModelForSequenceClassification.from_pretrained('swulling/bge-reranker-large-onnx-o4')
model.to("cuda")

pairs = [['what is panda?', 'hi'], ['what is panda?', 'The giant panda (Ailuropoda melanoleuca), sometimes called a panda bear or simply panda, is a bear species endemic to China.']]
with torch.no_grad():
    inputs = tokenizer(pairs, padding=True, truncation=True, return_tensors='pt', max_length=512)
    scores = model(**inputs, return_dict=True).logits.view(-1, ).float()
    print(scores)

Source model

https://huggingface.co/BAAI/bge-reranker-large

Downloads last month
1,069
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.