t0b1as91's picture
Update README.md
6caf95a verified
|
raw
history blame
2.91 kB
metadata
license: apache-2.0

This model is a merged model, using bge-small-en-v1.5, GIST-Embedding-v0 and gte-base. This model focuses on retrieval tasks while also performing well on various tasks (See experiment details below).

Usage

For retrieval tasks

from transformers import AutoTokenizer, AutoModel
import torch
# Sentences we want sentence embeddings for
sentences = ["this is a test sentence", "this is another test sentence"]

# Prefixing for retrieval tasks
instruction = "Represent this sentence for searching relevant passages: "

# Load model from HuggingFace Hub
tokenizer = AutoTokenizer.from_pretrained('Marqo/marqo-merged-bge-gist-gte-base')
model = AutoModel.from_pretrained('Marqo/marqo-merged-bge-gist-gte-base')
model.eval()

# Tokenize sentences
encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')
encoded_input_with_prefixing = tokenizer([instruction + q for q in queries], padding=True, truncation=True, return_tensors='pt')

# Compute token embeddings
with torch.no_grad():
    model_output = model(**encoded_input)
    model_output_with_prefixing = model(**encoded_input_with_prefixing)
    model_output_avg = (model_output + model_output_with_prefixing) / 2
    # Perform pooling. In this case, cls pooling.
    sentence_embeddings = model_output_avg[0][:, 0]
# normalize embeddings
sentence_embeddings = torch.nn.functional.normalize(sentence_embeddings, p=2, dim=1)
print("Sentence embeddings:", sentence_embeddings)

Evaluation

109M models retrieval benchmarks
Models Average ArguAna ClimateFEVER CQADupstackRetrieval DBPedia FEVER FiQA2018 HotpotQA MSMARCO NFCorpus NQ QuoraRetrieval SCIDOCS SciFact Touche2020 TRECCOVID
Slerp (average prefixing) 53.37 63.45 31.66 42.51 42.15 86.53 43.28 73.72 39.16 37.66 54.39 88.96 22.7 76.66 21.46 76.23
Slerp (prefixing) 53.18 63.58 30.67 43.23 41.52 86.54 41.28 71.43 41.16 38.01 55.28 88.72 22.75 75.05 22.15 76.27
BGE 53.25 63.61 31.17 42.35 40.77 86.29 40.65 72.6 41.35 37.39 54.15 88.9 21.73 74.04 25.7 78.07
GIST 52.31 62.62 31.49 43.2 41.7 86.65 40.64 68.92 40.64 37.64 53.43 88.81 23.47 75.29 20.58 69.6
GTE 51.14 57.12 28.1 42.91 41.19 81.52 40.76 65.75 40.21 37.9 52.84 88.15 23.13 76.18 22.55 68.78