PubMedBERT Embeddings Matryoshka - ONNX - O4

O4 optimized weights of NeuML/pubmedbert-base-embeddings-matryoshka.

Usage

from optimum.onnxruntime import ORTModelForFeatureExtraction
from transformers import AutoTokenizer
import torch

# Mean Pooling - Take attention mask into account for correct averaging
def meanpooling(output, mask):
    embeddings = output[0] # First element of model_output contains all token embeddings
    mask = mask.unsqueeze(-1).expand(embeddings.size()).float()
    return torch.sum(embeddings * mask, 1) / torch.clamp(mask.sum(1), min=1e-9)

# Sentences we want sentence embeddings for
sentences = ['This is an example sentence', 'Each sentence is converted']

model = ORTModelForFeatureExtraction.from_pretrained("hooman650/pubmedbert-base-embeddings-matryoshka-onnx-04",provider="CUDAExecutionProvider")
tokenizer = AutoTokenizer.from_pretrained("hooman650/pubmedbert-base-embeddings-matryoshka-onnx-04")

# Tokenize sentences
inputs = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt').to("cude") # if on GPU

# Compute token embeddings
with torch.no_grad():
    output = model(**inputs)

# Perform pooling. In this case, mean pooling.
embeddings = meanpooling(output, inputs['attention_mask'])

# Requested matryoshka dimensions
dimensions = 256

print("Sentence embeddings:")
print(embeddings[:, :dimensions])
Downloads last month
3
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.