Snowflake/snowflake-arctic-embed-m

Nov 20

I the model card you have:

Query: what is snowflake?
0.20051965 The Data Cloud!
0.07660701 Mexico City of Course!
Query: Where can I get the best tacos?
0.24481852 Mexico City of Course!
0.15664819 The Data Cloud!

but I am getting:

Query: what is snowflake?
0.2747492 The Data Cloud!
0.19998044 Mexico City of Course!
Query: Where can I get the best tacos?
0.29974818 Mexico City of Course!
0.2344071 The Data Cloud!

tomaarsen

Nov 20

Hello!

Interesting, back when I contributed the integration, it gave the original results:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("Snowflake/snowflake-arctic-embed-m", revision="ceb801b3fa8ffbf10e2809da664edb3775bcba8b")

queries = ['what is snowflake?', 'Where can I get the best tacos?']
documents = ['The Data Cloud!', 'Mexico City of Course!']

query_embeddings = model.encode(queries, prompt_name="query")
document_embeddings = model.encode(documents)

scores = query_embeddings @ document_embeddings.T
for query, query_scores in zip(queries, scores):
    doc_score_pairs = list(zip(documents, query_scores))
    doc_score_pairs = sorted(doc_score_pairs, key=lambda x: x[1], reverse=True)
    # Output passages & scores
    print("Query:", query)
    for document, score in doc_score_pairs:
        print(score, document)

Since commit ceb801b3fa8ffbf10e2809da664edb3775bcba8b something must have changed.

Oh, #6 updated the model, it was previously incorrect.
@spacemanidol I'd recommend updating the README outputs to

Query: what is snowflake?
0.2747492 The Data Cloud!
0.19998045 Mexico City of Course!
Query: Where can I get the best tacos?
0.29974818 Mexico City of Course!
0.2344071 The Data Cloud!

to avoid this confusion.

Tom Aarsen

spacemanidol

Snowflake org Nov 22

Thanks for calling out. Fixed it.

spacemanidol changed discussion status to closed Nov 22

djstrong

29 days ago

Thanks. Results on MTEB are calculated after the correction? I am asking because snowflake models have much worse scores on my dataset in comparison to to other models and MTEB average.

Snowflake
/

snowflake-arctic-embed-m

Different result