Sentence Similarity
PEFT
Safetensors
English
text-embedding
embeddings
information-retrieval
beir
text-classification
language-model
text-clustering
text-semantic-similarity
text-evaluation
text-reranking
feature-extraction
Sentence Similarity
natural_questions
ms_marco
fever
hotpot_qa
mteb
Eval Results
library_name: peft | |
license: mit | |
language: | |
- en | |
pipeline_tag: sentence-similarity | |
tags: | |
- text-embedding | |
- embeddings | |
- information-retrieval | |
- beir | |
- text-classification | |
- language-model | |
- text-clustering | |
- text-semantic-similarity | |
- text-evaluation | |
- text-reranking | |
- feature-extraction | |
- sentence-similarity | |
- Sentence Similarity | |
- natural_questions | |
- ms_marco | |
- fever | |
- hotpot_qa | |
- mteb | |
model-index: | |
- name: LLM2Vec-Mistral-7B-supervised | |
results: | |
- task: | |
type: Classification | |
dataset: | |
type: mteb/amazon_counterfactual | |
name: MTEB AmazonCounterfactualClassification (en) | |
config: en | |
split: test | |
revision: e8379541af4e31359cca9fbcf4b00f2671dba205 | |
metrics: | |
- type: accuracy | |
value: 77.58208955223881 | |
- type: ap | |
value: 41.45474097979136 | |
- type: f1 | |
value: 71.76059891468786 | |
- task: | |
type: Classification | |
dataset: | |
type: mteb/amazon_polarity | |
name: MTEB AmazonPolarityClassification | |
config: default | |
split: test | |
revision: e2d317d38cd51312af73b3d32a06d1a08b442046 | |
metrics: | |
- type: accuracy | |
value: 91.12039999999999 | |
- type: ap | |
value: 88.01002974730474 | |
- type: f1 | |
value: 91.1049266954883 | |
- task: | |
type: Classification | |
dataset: | |
type: mteb/amazon_reviews_multi | |
name: MTEB AmazonReviewsClassification (en) | |
config: en | |
split: test | |
revision: 1399c76144fd37290681b995c656ef9b2e06e26d | |
metrics: | |
- type: accuracy | |
value: 49.966 | |
- type: f1 | |
value: 48.908221884634386 | |
- task: | |
type: Retrieval | |
dataset: | |
type: arguana | |
name: MTEB ArguAna | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 32.788000000000004 | |
- type: map_at_10 | |
value: 48.665000000000006 | |
- type: map_at_100 | |
value: 49.501 | |
- type: map_at_1000 | |
value: 49.504 | |
- type: map_at_3 | |
value: 43.883 | |
- type: map_at_5 | |
value: 46.501 | |
- type: mrr_at_1 | |
value: 33.357 | |
- type: mrr_at_10 | |
value: 48.882 | |
- type: mrr_at_100 | |
value: 49.718 | |
- type: mrr_at_1000 | |
value: 49.721 | |
- type: mrr_at_3 | |
value: 44.025999999999996 | |
- type: mrr_at_5 | |
value: 46.732 | |
- type: ndcg_at_1 | |
value: 32.788000000000004 | |
- type: ndcg_at_10 | |
value: 57.483 | |
- type: ndcg_at_100 | |
value: 60.745000000000005 | |
- type: ndcg_at_1000 | |
value: 60.797000000000004 | |
- type: ndcg_at_3 | |
value: 47.534 | |
- type: ndcg_at_5 | |
value: 52.266 | |
- type: precision_at_1 | |
value: 32.788000000000004 | |
- type: precision_at_10 | |
value: 8.57 | |
- type: precision_at_100 | |
value: 0.993 | |
- type: precision_at_1000 | |
value: 0.1 | |
- type: precision_at_3 | |
value: 19.369 | |
- type: precision_at_5 | |
value: 13.926 | |
- type: recall_at_1 | |
value: 32.788000000000004 | |
- type: recall_at_10 | |
value: 85.70400000000001 | |
- type: recall_at_100 | |
value: 99.289 | |
- type: recall_at_1000 | |
value: 99.644 | |
- type: recall_at_3 | |
value: 58.108000000000004 | |
- type: recall_at_5 | |
value: 69.63000000000001 | |
- task: | |
type: Clustering | |
dataset: | |
type: mteb/arxiv-clustering-p2p | |
name: MTEB ArxivClusteringP2P | |
config: default | |
split: test | |
revision: a122ad7f3f0291bf49cc6f4d32aa80929df69d5d | |
metrics: | |
- type: v_measure | |
value: 42.805075760047906 | |
- task: | |
type: Clustering | |
dataset: | |
type: mteb/arxiv-clustering-s2s | |
name: MTEB ArxivClusteringS2S | |
config: default | |
split: test | |
revision: f910caf1a6075f7329cdf8c1a6135696f37dbd53 | |
metrics: | |
- type: v_measure | |
value: 44.235789514284214 | |
- task: | |
type: Reranking | |
dataset: | |
type: mteb/askubuntudupquestions-reranking | |
name: MTEB AskUbuntuDupQuestions | |
config: default | |
split: test | |
revision: 2000358ca161889fa9c082cb41daa8dcfb161a54 | |
metrics: | |
- type: map | |
value: 63.98320383943591 | |
- type: mrr | |
value: 76.53189992525174 | |
- task: | |
type: STS | |
dataset: | |
type: mteb/biosses-sts | |
name: MTEB BIOSSES | |
config: default | |
split: test | |
revision: d3fb88f8f02e40887cd149695127462bbcf29b4a | |
metrics: | |
- type: cos_sim_spearman | |
value: 85.24411101959603 | |
- task: | |
type: Classification | |
dataset: | |
type: mteb/banking77 | |
name: MTEB Banking77Classification | |
config: default | |
split: test | |
revision: 0fd18e25b25c072e09e0d92ab615fda904d66300 | |
metrics: | |
- type: accuracy | |
value: 88.31493506493506 | |
- type: f1 | |
value: 88.28524975751309 | |
- task: | |
type: Clustering | |
dataset: | |
type: mteb/biorxiv-clustering-p2p | |
name: MTEB BiorxivClusteringP2P | |
config: default | |
split: test | |
revision: 65b79d1d13f80053f67aca9498d9402c2d9f1f40 | |
metrics: | |
- type: v_measure | |
value: 34.27007175430729 | |
- task: | |
type: Clustering | |
dataset: | |
type: mteb/biorxiv-clustering-s2s | |
name: MTEB BiorxivClusteringS2S | |
config: default | |
split: test | |
revision: 258694dd0231531bc1fd9de6ceb52a0853c6d908 | |
metrics: | |
- type: v_measure | |
value: 35.52517776034658 | |
- task: | |
type: Retrieval | |
dataset: | |
type: cqadupstack/android | |
name: MTEB CQADupstackAndroidRetrieval | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 38.686 | |
- type: map_at_10 | |
value: 51.939 | |
- type: map_at_100 | |
value: 53.751000000000005 | |
- type: map_at_1000 | |
value: 53.846000000000004 | |
- type: map_at_3 | |
value: 48.296 | |
- type: map_at_5 | |
value: 50.312999999999995 | |
- type: mrr_at_1 | |
value: 49.641999999999996 | |
- type: mrr_at_10 | |
value: 59.157000000000004 | |
- type: mrr_at_100 | |
value: 59.85 | |
- type: mrr_at_1000 | |
value: 59.876 | |
- type: mrr_at_3 | |
value: 57.058 | |
- type: mrr_at_5 | |
value: 58.231 | |
- type: ndcg_at_1 | |
value: 49.641999999999996 | |
- type: ndcg_at_10 | |
value: 58.714 | |
- type: ndcg_at_100 | |
value: 63.776999999999994 | |
- type: ndcg_at_1000 | |
value: 64.95 | |
- type: ndcg_at_3 | |
value: 54.799 | |
- type: ndcg_at_5 | |
value: 56.372 | |
- type: precision_at_1 | |
value: 49.641999999999996 | |
- type: precision_at_10 | |
value: 11.373 | |
- type: precision_at_100 | |
value: 1.712 | |
- type: precision_at_1000 | |
value: 0.209 | |
- type: precision_at_3 | |
value: 27.229 | |
- type: precision_at_5 | |
value: 19.056 | |
- type: recall_at_1 | |
value: 38.686 | |
- type: recall_at_10 | |
value: 69.976 | |
- type: recall_at_100 | |
value: 90.512 | |
- type: recall_at_1000 | |
value: 97.64 | |
- type: recall_at_3 | |
value: 56.625 | |
- type: recall_at_5 | |
value: 62.348000000000006 | |
- task: | |
type: Retrieval | |
dataset: | |
type: cqadupstack/english | |
name: MTEB CQADupstackEnglishRetrieval | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 36.356 | |
- type: map_at_10 | |
value: 48.004000000000005 | |
- type: map_at_100 | |
value: 49.342999999999996 | |
- type: map_at_1000 | |
value: 49.461 | |
- type: map_at_3 | |
value: 44.692 | |
- type: map_at_5 | |
value: 46.576 | |
- type: mrr_at_1 | |
value: 46.561 | |
- type: mrr_at_10 | |
value: 54.547000000000004 | |
- type: mrr_at_100 | |
value: 55.159000000000006 | |
- type: mrr_at_1000 | |
value: 55.193000000000005 | |
- type: mrr_at_3 | |
value: 52.516 | |
- type: mrr_at_5 | |
value: 53.701 | |
- type: ndcg_at_1 | |
value: 46.561 | |
- type: ndcg_at_10 | |
value: 53.835 | |
- type: ndcg_at_100 | |
value: 57.92699999999999 | |
- type: ndcg_at_1000 | |
value: 59.671 | |
- type: ndcg_at_3 | |
value: 49.997 | |
- type: ndcg_at_5 | |
value: 51.714000000000006 | |
- type: precision_at_1 | |
value: 46.561 | |
- type: precision_at_10 | |
value: 10.344000000000001 | |
- type: precision_at_100 | |
value: 1.5779999999999998 | |
- type: precision_at_1000 | |
value: 0.202 | |
- type: precision_at_3 | |
value: 24.437 | |
- type: precision_at_5 | |
value: 17.197000000000003 | |
- type: recall_at_1 | |
value: 36.356 | |
- type: recall_at_10 | |
value: 63.019000000000005 | |
- type: recall_at_100 | |
value: 80.55099999999999 | |
- type: recall_at_1000 | |
value: 91.38300000000001 | |
- type: recall_at_3 | |
value: 50.431000000000004 | |
- type: recall_at_5 | |
value: 56.00000000000001 | |
- task: | |
type: Retrieval | |
dataset: | |
type: cqadupstack/gaming | |
name: MTEB CQADupstackGamingRetrieval | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 46.736 | |
- type: map_at_10 | |
value: 60.775999999999996 | |
- type: map_at_100 | |
value: 61.755 | |
- type: map_at_1000 | |
value: 61.783 | |
- type: map_at_3 | |
value: 57.293000000000006 | |
- type: map_at_5 | |
value: 59.382000000000005 | |
- type: mrr_at_1 | |
value: 54.232 | |
- type: mrr_at_10 | |
value: 64.424 | |
- type: mrr_at_100 | |
value: 64.996 | |
- type: mrr_at_1000 | |
value: 65.009 | |
- type: mrr_at_3 | |
value: 62.226000000000006 | |
- type: mrr_at_5 | |
value: 63.592000000000006 | |
- type: ndcg_at_1 | |
value: 54.232 | |
- type: ndcg_at_10 | |
value: 66.654 | |
- type: ndcg_at_100 | |
value: 70.152 | |
- type: ndcg_at_1000 | |
value: 70.648 | |
- type: ndcg_at_3 | |
value: 61.405 | |
- type: ndcg_at_5 | |
value: 64.137 | |
- type: precision_at_1 | |
value: 54.232 | |
- type: precision_at_10 | |
value: 10.607999999999999 | |
- type: precision_at_100 | |
value: 1.321 | |
- type: precision_at_1000 | |
value: 0.13899999999999998 | |
- type: precision_at_3 | |
value: 27.544 | |
- type: precision_at_5 | |
value: 18.645999999999997 | |
- type: recall_at_1 | |
value: 46.736 | |
- type: recall_at_10 | |
value: 80.10199999999999 | |
- type: recall_at_100 | |
value: 94.976 | |
- type: recall_at_1000 | |
value: 98.402 | |
- type: recall_at_3 | |
value: 66.094 | |
- type: recall_at_5 | |
value: 73.028 | |
- task: | |
type: Retrieval | |
dataset: | |
type: cqadupstack/gis | |
name: MTEB CQADupstackGisRetrieval | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 30.238 | |
- type: map_at_10 | |
value: 39.798 | |
- type: map_at_100 | |
value: 40.892 | |
- type: map_at_1000 | |
value: 40.971000000000004 | |
- type: map_at_3 | |
value: 36.788 | |
- type: map_at_5 | |
value: 38.511 | |
- type: mrr_at_1 | |
value: 32.994 | |
- type: mrr_at_10 | |
value: 42.028 | |
- type: mrr_at_100 | |
value: 42.959 | |
- type: mrr_at_1000 | |
value: 43.010999999999996 | |
- type: mrr_at_3 | |
value: 39.322 | |
- type: mrr_at_5 | |
value: 40.977000000000004 | |
- type: ndcg_at_1 | |
value: 32.994 | |
- type: ndcg_at_10 | |
value: 45.062000000000005 | |
- type: ndcg_at_100 | |
value: 50.166999999999994 | |
- type: ndcg_at_1000 | |
value: 51.961 | |
- type: ndcg_at_3 | |
value: 39.378 | |
- type: ndcg_at_5 | |
value: 42.281 | |
- type: precision_at_1 | |
value: 32.994 | |
- type: precision_at_10 | |
value: 6.836 | |
- type: precision_at_100 | |
value: 0.9860000000000001 | |
- type: precision_at_1000 | |
value: 0.11800000000000001 | |
- type: precision_at_3 | |
value: 16.384 | |
- type: precision_at_5 | |
value: 11.548 | |
- type: recall_at_1 | |
value: 30.238 | |
- type: recall_at_10 | |
value: 59.080999999999996 | |
- type: recall_at_100 | |
value: 82.033 | |
- type: recall_at_1000 | |
value: 95.281 | |
- type: recall_at_3 | |
value: 43.902 | |
- type: recall_at_5 | |
value: 50.952 | |
- task: | |
type: Retrieval | |
dataset: | |
type: cqadupstack/mathematica | |
name: MTEB CQADupstackMathematicaRetrieval | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 21.512999999999998 | |
- type: map_at_10 | |
value: 31.339 | |
- type: map_at_100 | |
value: 32.651 | |
- type: map_at_1000 | |
value: 32.762 | |
- type: map_at_3 | |
value: 27.590999999999998 | |
- type: map_at_5 | |
value: 29.946 | |
- type: mrr_at_1 | |
value: 26.866 | |
- type: mrr_at_10 | |
value: 36.525 | |
- type: mrr_at_100 | |
value: 37.357 | |
- type: mrr_at_1000 | |
value: 37.419999999999995 | |
- type: mrr_at_3 | |
value: 33.085 | |
- type: mrr_at_5 | |
value: 35.379 | |
- type: ndcg_at_1 | |
value: 26.866 | |
- type: ndcg_at_10 | |
value: 37.621 | |
- type: ndcg_at_100 | |
value: 43.031000000000006 | |
- type: ndcg_at_1000 | |
value: 45.573 | |
- type: ndcg_at_3 | |
value: 31.046000000000003 | |
- type: ndcg_at_5 | |
value: 34.709 | |
- type: precision_at_1 | |
value: 26.866 | |
- type: precision_at_10 | |
value: 7.052 | |
- type: precision_at_100 | |
value: 1.117 | |
- type: precision_at_1000 | |
value: 0.145 | |
- type: precision_at_3 | |
value: 14.884 | |
- type: precision_at_5 | |
value: 11.517 | |
- type: recall_at_1 | |
value: 21.512999999999998 | |
- type: recall_at_10 | |
value: 51.751999999999995 | |
- type: recall_at_100 | |
value: 74.34100000000001 | |
- type: recall_at_1000 | |
value: 92.426 | |
- type: recall_at_3 | |
value: 34.008 | |
- type: recall_at_5 | |
value: 43.075 | |
- task: | |
type: Retrieval | |
dataset: | |
type: cqadupstack/physics | |
name: MTEB CQADupstackPhysicsRetrieval | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 35.327 | |
- type: map_at_10 | |
value: 47.783 | |
- type: map_at_100 | |
value: 49.153999999999996 | |
- type: map_at_1000 | |
value: 49.260999999999996 | |
- type: map_at_3 | |
value: 44.145 | |
- type: map_at_5 | |
value: 46.207 | |
- type: mrr_at_1 | |
value: 44.37 | |
- type: mrr_at_10 | |
value: 53.864999999999995 | |
- type: mrr_at_100 | |
value: 54.625 | |
- type: mrr_at_1000 | |
value: 54.662 | |
- type: mrr_at_3 | |
value: 51.604000000000006 | |
- type: mrr_at_5 | |
value: 52.894 | |
- type: ndcg_at_1 | |
value: 44.37 | |
- type: ndcg_at_10 | |
value: 54.054 | |
- type: ndcg_at_100 | |
value: 59.168 | |
- type: ndcg_at_1000 | |
value: 60.769 | |
- type: ndcg_at_3 | |
value: 49.091 | |
- type: ndcg_at_5 | |
value: 51.444 | |
- type: precision_at_1 | |
value: 44.37 | |
- type: precision_at_10 | |
value: 9.827 | |
- type: precision_at_100 | |
value: 1.456 | |
- type: precision_at_1000 | |
value: 0.17600000000000002 | |
- type: precision_at_3 | |
value: 23.580000000000002 | |
- type: precision_at_5 | |
value: 16.554 | |
- type: recall_at_1 | |
value: 35.327 | |
- type: recall_at_10 | |
value: 66.43900000000001 | |
- type: recall_at_100 | |
value: 87.41600000000001 | |
- type: recall_at_1000 | |
value: 97.37400000000001 | |
- type: recall_at_3 | |
value: 51.64 | |
- type: recall_at_5 | |
value: 58.242000000000004 | |
- task: | |
type: Retrieval | |
dataset: | |
type: cqadupstack/programmers | |
name: MTEB CQADupstackProgrammersRetrieval | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 32.397999999999996 | |
- type: map_at_10 | |
value: 44.932 | |
- type: map_at_100 | |
value: 46.336 | |
- type: map_at_1000 | |
value: 46.421 | |
- type: map_at_3 | |
value: 41.128 | |
- type: map_at_5 | |
value: 43.364999999999995 | |
- type: mrr_at_1 | |
value: 41.324 | |
- type: mrr_at_10 | |
value: 51.080000000000005 | |
- type: mrr_at_100 | |
value: 51.878 | |
- type: mrr_at_1000 | |
value: 51.910000000000004 | |
- type: mrr_at_3 | |
value: 48.382999999999996 | |
- type: mrr_at_5 | |
value: 50.004000000000005 | |
- type: ndcg_at_1 | |
value: 41.324 | |
- type: ndcg_at_10 | |
value: 51.466 | |
- type: ndcg_at_100 | |
value: 56.874 | |
- type: ndcg_at_1000 | |
value: 58.321999999999996 | |
- type: ndcg_at_3 | |
value: 45.928999999999995 | |
- type: ndcg_at_5 | |
value: 48.532 | |
- type: precision_at_1 | |
value: 41.324 | |
- type: precision_at_10 | |
value: 9.565999999999999 | |
- type: precision_at_100 | |
value: 1.428 | |
- type: precision_at_1000 | |
value: 0.172 | |
- type: precision_at_3 | |
value: 22.184 | |
- type: precision_at_5 | |
value: 15.867999999999999 | |
- type: recall_at_1 | |
value: 32.397999999999996 | |
- type: recall_at_10 | |
value: 64.512 | |
- type: recall_at_100 | |
value: 87.425 | |
- type: recall_at_1000 | |
value: 96.937 | |
- type: recall_at_3 | |
value: 48.513 | |
- type: recall_at_5 | |
value: 55.721 | |
- task: | |
type: Retrieval | |
dataset: | |
type: mteb/cqadupstack | |
name: MTEB CQADupstackRetrieval | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 32.001916666666666 | |
- type: map_at_10 | |
value: 42.91216666666667 | |
- type: map_at_100 | |
value: 44.21125000000001 | |
- type: map_at_1000 | |
value: 44.314166666666665 | |
- type: map_at_3 | |
value: 39.579 | |
- type: map_at_5 | |
value: 41.497166666666665 | |
- type: mrr_at_1 | |
value: 38.669583333333335 | |
- type: mrr_at_10 | |
value: 47.708 | |
- type: mrr_at_100 | |
value: 48.4875 | |
- type: mrr_at_1000 | |
value: 48.530833333333334 | |
- type: mrr_at_3 | |
value: 45.196333333333335 | |
- type: mrr_at_5 | |
value: 46.702999999999996 | |
- type: ndcg_at_1 | |
value: 38.669583333333335 | |
- type: ndcg_at_10 | |
value: 48.842 | |
- type: ndcg_at_100 | |
value: 53.79400000000001 | |
- type: ndcg_at_1000 | |
value: 55.566416666666676 | |
- type: ndcg_at_3 | |
value: 43.70975 | |
- type: ndcg_at_5 | |
value: 46.204499999999996 | |
- type: precision_at_1 | |
value: 38.669583333333335 | |
- type: precision_at_10 | |
value: 8.652999999999999 | |
- type: precision_at_100 | |
value: 1.3168333333333333 | |
- type: precision_at_1000 | |
value: 0.164 | |
- type: precision_at_3 | |
value: 20.343249999999998 | |
- type: precision_at_5 | |
value: 14.426 | |
- type: recall_at_1 | |
value: 32.001916666666666 | |
- type: recall_at_10 | |
value: 61.31158333333334 | |
- type: recall_at_100 | |
value: 82.80691666666667 | |
- type: recall_at_1000 | |
value: 94.977 | |
- type: recall_at_3 | |
value: 46.63558333333333 | |
- type: recall_at_5 | |
value: 53.32383333333334 | |
- task: | |
type: Retrieval | |
dataset: | |
type: cqadupstack/stats | |
name: MTEB CQADupstackStatsRetrieval | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 29.311999999999998 | |
- type: map_at_10 | |
value: 37.735 | |
- type: map_at_100 | |
value: 38.702 | |
- type: map_at_1000 | |
value: 38.803 | |
- type: map_at_3 | |
value: 35.17 | |
- type: map_at_5 | |
value: 36.6 | |
- type: mrr_at_1 | |
value: 33.282000000000004 | |
- type: mrr_at_10 | |
value: 41.059 | |
- type: mrr_at_100 | |
value: 41.881 | |
- type: mrr_at_1000 | |
value: 41.943000000000005 | |
- type: mrr_at_3 | |
value: 38.829 | |
- type: mrr_at_5 | |
value: 40.11 | |
- type: ndcg_at_1 | |
value: 33.282000000000004 | |
- type: ndcg_at_10 | |
value: 42.625 | |
- type: ndcg_at_100 | |
value: 47.313 | |
- type: ndcg_at_1000 | |
value: 49.683 | |
- type: ndcg_at_3 | |
value: 38.043 | |
- type: ndcg_at_5 | |
value: 40.217999999999996 | |
- type: precision_at_1 | |
value: 33.282000000000004 | |
- type: precision_at_10 | |
value: 6.748 | |
- type: precision_at_100 | |
value: 0.979 | |
- type: precision_at_1000 | |
value: 0.126 | |
- type: precision_at_3 | |
value: 16.462 | |
- type: precision_at_5 | |
value: 11.411 | |
- type: recall_at_1 | |
value: 29.311999999999998 | |
- type: recall_at_10 | |
value: 54.294 | |
- type: recall_at_100 | |
value: 75.82 | |
- type: recall_at_1000 | |
value: 93.19800000000001 | |
- type: recall_at_3 | |
value: 41.382999999999996 | |
- type: recall_at_5 | |
value: 46.898 | |
- task: | |
type: Retrieval | |
dataset: | |
type: cqadupstack/tex | |
name: MTEB CQADupstackTexRetrieval | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 22.823 | |
- type: map_at_10 | |
value: 31.682 | |
- type: map_at_100 | |
value: 32.864 | |
- type: map_at_1000 | |
value: 32.988 | |
- type: map_at_3 | |
value: 28.878999999999998 | |
- type: map_at_5 | |
value: 30.459000000000003 | |
- type: mrr_at_1 | |
value: 28.63 | |
- type: mrr_at_10 | |
value: 36.672 | |
- type: mrr_at_100 | |
value: 37.519999999999996 | |
- type: mrr_at_1000 | |
value: 37.588 | |
- type: mrr_at_3 | |
value: 34.262 | |
- type: mrr_at_5 | |
value: 35.653 | |
- type: ndcg_at_1 | |
value: 28.63 | |
- type: ndcg_at_10 | |
value: 37.158 | |
- type: ndcg_at_100 | |
value: 42.4 | |
- type: ndcg_at_1000 | |
value: 45.001000000000005 | |
- type: ndcg_at_3 | |
value: 32.529 | |
- type: ndcg_at_5 | |
value: 34.673 | |
- type: precision_at_1 | |
value: 28.63 | |
- type: precision_at_10 | |
value: 6.848 | |
- type: precision_at_100 | |
value: 1.111 | |
- type: precision_at_1000 | |
value: 0.152 | |
- type: precision_at_3 | |
value: 15.623000000000001 | |
- type: precision_at_5 | |
value: 11.218 | |
- type: recall_at_1 | |
value: 22.823 | |
- type: recall_at_10 | |
value: 48.559000000000005 | |
- type: recall_at_100 | |
value: 72.048 | |
- type: recall_at_1000 | |
value: 90.322 | |
- type: recall_at_3 | |
value: 35.134 | |
- type: recall_at_5 | |
value: 40.897 | |
- task: | |
type: Retrieval | |
dataset: | |
type: cqadupstack/unix | |
name: MTEB CQADupstackUnixRetrieval | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 32.79 | |
- type: map_at_10 | |
value: 43.578 | |
- type: map_at_100 | |
value: 44.782 | |
- type: map_at_1000 | |
value: 44.869 | |
- type: map_at_3 | |
value: 39.737 | |
- type: map_at_5 | |
value: 41.92 | |
- type: mrr_at_1 | |
value: 39.086 | |
- type: mrr_at_10 | |
value: 48.135 | |
- type: mrr_at_100 | |
value: 48.949 | |
- type: mrr_at_1000 | |
value: 48.995 | |
- type: mrr_at_3 | |
value: 45.086999999999996 | |
- type: mrr_at_5 | |
value: 46.939 | |
- type: ndcg_at_1 | |
value: 39.086 | |
- type: ndcg_at_10 | |
value: 49.736999999999995 | |
- type: ndcg_at_100 | |
value: 54.818999999999996 | |
- type: ndcg_at_1000 | |
value: 56.515 | |
- type: ndcg_at_3 | |
value: 43.503 | |
- type: ndcg_at_5 | |
value: 46.499 | |
- type: precision_at_1 | |
value: 39.086 | |
- type: precision_at_10 | |
value: 8.685 | |
- type: precision_at_100 | |
value: 1.2449999999999999 | |
- type: precision_at_1000 | |
value: 0.148 | |
- type: precision_at_3 | |
value: 19.963 | |
- type: precision_at_5 | |
value: 14.366000000000001 | |
- type: recall_at_1 | |
value: 32.79 | |
- type: recall_at_10 | |
value: 63.766 | |
- type: recall_at_100 | |
value: 85.465 | |
- type: recall_at_1000 | |
value: 96.90299999999999 | |
- type: recall_at_3 | |
value: 46.515 | |
- type: recall_at_5 | |
value: 54.178000000000004 | |
- task: | |
type: Retrieval | |
dataset: | |
type: cqadupstack/webmasters | |
name: MTEB CQADupstackWebmastersRetrieval | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 29.896 | |
- type: map_at_10 | |
value: 41.241 | |
- type: map_at_100 | |
value: 43.178 | |
- type: map_at_1000 | |
value: 43.395 | |
- type: map_at_3 | |
value: 37.702999999999996 | |
- type: map_at_5 | |
value: 39.524 | |
- type: mrr_at_1 | |
value: 36.364000000000004 | |
- type: mrr_at_10 | |
value: 46.184999999999995 | |
- type: mrr_at_100 | |
value: 47.051 | |
- type: mrr_at_1000 | |
value: 47.085 | |
- type: mrr_at_3 | |
value: 43.478 | |
- type: mrr_at_5 | |
value: 44.98 | |
- type: ndcg_at_1 | |
value: 36.364000000000004 | |
- type: ndcg_at_10 | |
value: 48.044 | |
- type: ndcg_at_100 | |
value: 53.818999999999996 | |
- type: ndcg_at_1000 | |
value: 55.504 | |
- type: ndcg_at_3 | |
value: 42.604 | |
- type: ndcg_at_5 | |
value: 44.971 | |
- type: precision_at_1 | |
value: 36.364000000000004 | |
- type: precision_at_10 | |
value: 9.664 | |
- type: precision_at_100 | |
value: 1.917 | |
- type: precision_at_1000 | |
value: 0.255 | |
- type: precision_at_3 | |
value: 20.487 | |
- type: precision_at_5 | |
value: 14.862 | |
- type: recall_at_1 | |
value: 29.896 | |
- type: recall_at_10 | |
value: 60.28 | |
- type: recall_at_100 | |
value: 86.271 | |
- type: recall_at_1000 | |
value: 97.121 | |
- type: recall_at_3 | |
value: 44.885999999999996 | |
- type: recall_at_5 | |
value: 51.351 | |
- task: | |
type: Retrieval | |
dataset: | |
type: cqadupstack/wordpress | |
name: MTEB CQADupstackWordpressRetrieval | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 27.948 | |
- type: map_at_10 | |
value: 36.138999999999996 | |
- type: map_at_100 | |
value: 37.126999999999995 | |
- type: map_at_1000 | |
value: 37.21 | |
- type: map_at_3 | |
value: 33.526 | |
- type: map_at_5 | |
value: 35.163 | |
- type: mrr_at_1 | |
value: 30.684 | |
- type: mrr_at_10 | |
value: 38.818999999999996 | |
- type: mrr_at_100 | |
value: 39.625 | |
- type: mrr_at_1000 | |
value: 39.678000000000004 | |
- type: mrr_at_3 | |
value: 36.506 | |
- type: mrr_at_5 | |
value: 37.976 | |
- type: ndcg_at_1 | |
value: 30.684 | |
- type: ndcg_at_10 | |
value: 41.134 | |
- type: ndcg_at_100 | |
value: 46.081 | |
- type: ndcg_at_1000 | |
value: 48.199999999999996 | |
- type: ndcg_at_3 | |
value: 36.193 | |
- type: ndcg_at_5 | |
value: 38.903999999999996 | |
- type: precision_at_1 | |
value: 30.684 | |
- type: precision_at_10 | |
value: 6.285 | |
- type: precision_at_100 | |
value: 0.9520000000000001 | |
- type: precision_at_1000 | |
value: 0.126 | |
- type: precision_at_3 | |
value: 15.342 | |
- type: precision_at_5 | |
value: 10.869 | |
- type: recall_at_1 | |
value: 27.948 | |
- type: recall_at_10 | |
value: 53.959 | |
- type: recall_at_100 | |
value: 76.825 | |
- type: recall_at_1000 | |
value: 92.73700000000001 | |
- type: recall_at_3 | |
value: 40.495999999999995 | |
- type: recall_at_5 | |
value: 47.196 | |
- task: | |
type: Retrieval | |
dataset: | |
type: climate-fever | |
name: MTEB ClimateFEVER | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 15.27 | |
- type: map_at_10 | |
value: 25.570999999999998 | |
- type: map_at_100 | |
value: 27.664 | |
- type: map_at_1000 | |
value: 27.848 | |
- type: map_at_3 | |
value: 21.224 | |
- type: map_at_5 | |
value: 23.508000000000003 | |
- type: mrr_at_1 | |
value: 34.137 | |
- type: mrr_at_10 | |
value: 46.583000000000006 | |
- type: mrr_at_100 | |
value: 47.339999999999996 | |
- type: mrr_at_1000 | |
value: 47.370000000000005 | |
- type: mrr_at_3 | |
value: 43.376999999999995 | |
- type: mrr_at_5 | |
value: 45.26 | |
- type: ndcg_at_1 | |
value: 34.137 | |
- type: ndcg_at_10 | |
value: 35.189 | |
- type: ndcg_at_100 | |
value: 42.568 | |
- type: ndcg_at_1000 | |
value: 45.660000000000004 | |
- type: ndcg_at_3 | |
value: 28.965000000000003 | |
- type: ndcg_at_5 | |
value: 31.169999999999998 | |
- type: precision_at_1 | |
value: 34.137 | |
- type: precision_at_10 | |
value: 10.971 | |
- type: precision_at_100 | |
value: 1.8870000000000002 | |
- type: precision_at_1000 | |
value: 0.247 | |
- type: precision_at_3 | |
value: 21.368000000000002 | |
- type: precision_at_5 | |
value: 16.573 | |
- type: recall_at_1 | |
value: 15.27 | |
- type: recall_at_10 | |
value: 41.516999999999996 | |
- type: recall_at_100 | |
value: 66.486 | |
- type: recall_at_1000 | |
value: 83.533 | |
- type: recall_at_3 | |
value: 26.325 | |
- type: recall_at_5 | |
value: 32.574 | |
- task: | |
type: Retrieval | |
dataset: | |
type: dbpedia-entity | |
name: MTEB DBPedia | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 9.982000000000001 | |
- type: map_at_10 | |
value: 23.724999999999998 | |
- type: map_at_100 | |
value: 33.933 | |
- type: map_at_1000 | |
value: 35.965 | |
- type: map_at_3 | |
value: 16.158 | |
- type: map_at_5 | |
value: 19.433 | |
- type: mrr_at_1 | |
value: 75.75 | |
- type: mrr_at_10 | |
value: 82.065 | |
- type: mrr_at_100 | |
value: 82.334 | |
- type: mrr_at_1000 | |
value: 82.34 | |
- type: mrr_at_3 | |
value: 80.708 | |
- type: mrr_at_5 | |
value: 81.671 | |
- type: ndcg_at_1 | |
value: 63.625 | |
- type: ndcg_at_10 | |
value: 49.576 | |
- type: ndcg_at_100 | |
value: 53.783 | |
- type: ndcg_at_1000 | |
value: 61.012 | |
- type: ndcg_at_3 | |
value: 53.822 | |
- type: ndcg_at_5 | |
value: 51.72 | |
- type: precision_at_1 | |
value: 75.75 | |
- type: precision_at_10 | |
value: 39.925 | |
- type: precision_at_100 | |
value: 12.525 | |
- type: precision_at_1000 | |
value: 2.399 | |
- type: precision_at_3 | |
value: 56.667 | |
- type: precision_at_5 | |
value: 50.5 | |
- type: recall_at_1 | |
value: 9.982000000000001 | |
- type: recall_at_10 | |
value: 29.325000000000003 | |
- type: recall_at_100 | |
value: 59.181 | |
- type: recall_at_1000 | |
value: 82.095 | |
- type: recall_at_3 | |
value: 17.338 | |
- type: recall_at_5 | |
value: 22.216 | |
- task: | |
type: Classification | |
dataset: | |
type: mteb/emotion | |
name: MTEB EmotionClassification | |
config: default | |
split: test | |
revision: 4f58c6b202a23cf9a4da393831edf4f9183cad37 | |
metrics: | |
- type: accuracy | |
value: 52.04500000000001 | |
- type: f1 | |
value: 47.32462453881906 | |
- task: | |
type: Retrieval | |
dataset: | |
type: fever | |
name: MTEB FEVER | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 78.68 | |
- type: map_at_10 | |
value: 86.207 | |
- type: map_at_100 | |
value: 86.375 | |
- type: map_at_1000 | |
value: 86.388 | |
- type: map_at_3 | |
value: 85.35199999999999 | |
- type: map_at_5 | |
value: 85.954 | |
- type: mrr_at_1 | |
value: 84.923 | |
- type: mrr_at_10 | |
value: 90.902 | |
- type: mrr_at_100 | |
value: 90.952 | |
- type: mrr_at_1000 | |
value: 90.952 | |
- type: mrr_at_3 | |
value: 90.489 | |
- type: mrr_at_5 | |
value: 90.822 | |
- type: ndcg_at_1 | |
value: 84.923 | |
- type: ndcg_at_10 | |
value: 89.403 | |
- type: ndcg_at_100 | |
value: 90.023 | |
- type: ndcg_at_1000 | |
value: 90.235 | |
- type: ndcg_at_3 | |
value: 88.24300000000001 | |
- type: ndcg_at_5 | |
value: 89.005 | |
- type: precision_at_1 | |
value: 84.923 | |
- type: precision_at_10 | |
value: 10.495000000000001 | |
- type: precision_at_100 | |
value: 1.103 | |
- type: precision_at_1000 | |
value: 0.11399999999999999 | |
- type: precision_at_3 | |
value: 33.358 | |
- type: precision_at_5 | |
value: 20.579 | |
- type: recall_at_1 | |
value: 78.68 | |
- type: recall_at_10 | |
value: 94.622 | |
- type: recall_at_100 | |
value: 97.083 | |
- type: recall_at_1000 | |
value: 98.348 | |
- type: recall_at_3 | |
value: 91.499 | |
- type: recall_at_5 | |
value: 93.486 | |
- task: | |
type: Retrieval | |
dataset: | |
type: fiqa | |
name: MTEB FiQA2018 | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 25.781 | |
- type: map_at_10 | |
value: 44.669 | |
- type: map_at_100 | |
value: 46.831 | |
- type: map_at_1000 | |
value: 46.96 | |
- type: map_at_3 | |
value: 38.714 | |
- type: map_at_5 | |
value: 42.186 | |
- type: mrr_at_1 | |
value: 51.235 | |
- type: mrr_at_10 | |
value: 60.083 | |
- type: mrr_at_100 | |
value: 60.675999999999995 | |
- type: mrr_at_1000 | |
value: 60.706 | |
- type: mrr_at_3 | |
value: 57.665 | |
- type: mrr_at_5 | |
value: 59.084 | |
- type: ndcg_at_1 | |
value: 51.235 | |
- type: ndcg_at_10 | |
value: 53.111 | |
- type: ndcg_at_100 | |
value: 59.57900000000001 | |
- type: ndcg_at_1000 | |
value: 61.57 | |
- type: ndcg_at_3 | |
value: 48.397 | |
- type: ndcg_at_5 | |
value: 50.169 | |
- type: precision_at_1 | |
value: 51.235 | |
- type: precision_at_10 | |
value: 14.877 | |
- type: precision_at_100 | |
value: 2.173 | |
- type: precision_at_1000 | |
value: 0.253 | |
- type: precision_at_3 | |
value: 32.87 | |
- type: precision_at_5 | |
value: 24.29 | |
- type: recall_at_1 | |
value: 25.781 | |
- type: recall_at_10 | |
value: 61.464 | |
- type: recall_at_100 | |
value: 84.244 | |
- type: recall_at_1000 | |
value: 96.039 | |
- type: recall_at_3 | |
value: 44.105 | |
- type: recall_at_5 | |
value: 52.205999999999996 | |
- task: | |
type: Retrieval | |
dataset: | |
type: hotpotqa | |
name: MTEB HotpotQA | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 39.041 | |
- type: map_at_10 | |
value: 66.622 | |
- type: map_at_100 | |
value: 67.472 | |
- type: map_at_1000 | |
value: 67.52 | |
- type: map_at_3 | |
value: 62.81099999999999 | |
- type: map_at_5 | |
value: 65.23 | |
- type: mrr_at_1 | |
value: 78.082 | |
- type: mrr_at_10 | |
value: 83.827 | |
- type: mrr_at_100 | |
value: 84.03 | |
- type: mrr_at_1000 | |
value: 84.036 | |
- type: mrr_at_3 | |
value: 82.894 | |
- type: mrr_at_5 | |
value: 83.482 | |
- type: ndcg_at_1 | |
value: 78.082 | |
- type: ndcg_at_10 | |
value: 74.068 | |
- type: ndcg_at_100 | |
value: 76.981 | |
- type: ndcg_at_1000 | |
value: 77.887 | |
- type: ndcg_at_3 | |
value: 68.77600000000001 | |
- type: ndcg_at_5 | |
value: 71.763 | |
- type: precision_at_1 | |
value: 78.082 | |
- type: precision_at_10 | |
value: 15.822 | |
- type: precision_at_100 | |
value: 1.807 | |
- type: precision_at_1000 | |
value: 0.193 | |
- type: precision_at_3 | |
value: 44.956 | |
- type: precision_at_5 | |
value: 29.332 | |
- type: recall_at_1 | |
value: 39.041 | |
- type: recall_at_10 | |
value: 79.109 | |
- type: recall_at_100 | |
value: 90.371 | |
- type: recall_at_1000 | |
value: 96.313 | |
- type: recall_at_3 | |
value: 67.43400000000001 | |
- type: recall_at_5 | |
value: 73.329 | |
- task: | |
type: Classification | |
dataset: | |
type: mteb/imdb | |
name: MTEB ImdbClassification | |
config: default | |
split: test | |
revision: 3d86128a09e091d6018b6d26cad27f2739fc2db7 | |
metrics: | |
- type: accuracy | |
value: 87.422 | |
- type: ap | |
value: 83.07360776629146 | |
- type: f1 | |
value: 87.38583428778229 | |
- task: | |
type: Retrieval | |
dataset: | |
type: msmarco | |
name: MTEB MSMARCO | |
config: default | |
split: dev | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 21.715999999999998 | |
- type: map_at_10 | |
value: 34.821000000000005 | |
- type: map_at_100 | |
value: 36.022999999999996 | |
- type: map_at_1000 | |
value: 36.067 | |
- type: map_at_3 | |
value: 30.666 | |
- type: map_at_5 | |
value: 33.134 | |
- type: mrr_at_1 | |
value: 22.421 | |
- type: mrr_at_10 | |
value: 35.461 | |
- type: mrr_at_100 | |
value: 36.6 | |
- type: mrr_at_1000 | |
value: 36.638 | |
- type: mrr_at_3 | |
value: 31.413999999999998 | |
- type: mrr_at_5 | |
value: 33.823 | |
- type: ndcg_at_1 | |
value: 22.421 | |
- type: ndcg_at_10 | |
value: 42.169000000000004 | |
- type: ndcg_at_100 | |
value: 47.887 | |
- type: ndcg_at_1000 | |
value: 48.939 | |
- type: ndcg_at_3 | |
value: 33.786 | |
- type: ndcg_at_5 | |
value: 38.164 | |
- type: precision_at_1 | |
value: 22.421 | |
- type: precision_at_10 | |
value: 6.773999999999999 | |
- type: precision_at_100 | |
value: 0.962 | |
- type: precision_at_1000 | |
value: 0.105 | |
- type: precision_at_3 | |
value: 14.575 | |
- type: precision_at_5 | |
value: 10.963000000000001 | |
- type: recall_at_1 | |
value: 21.715999999999998 | |
- type: recall_at_10 | |
value: 64.75999999999999 | |
- type: recall_at_100 | |
value: 91.015 | |
- type: recall_at_1000 | |
value: 98.96000000000001 | |
- type: recall_at_3 | |
value: 42.089999999999996 | |
- type: recall_at_5 | |
value: 52.578 | |
- task: | |
type: Classification | |
dataset: | |
type: mteb/mtop_domain | |
name: MTEB MTOPDomainClassification (en) | |
config: en | |
split: test | |
revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf | |
metrics: | |
- type: accuracy | |
value: 96.04195166438669 | |
- type: f1 | |
value: 95.76962987454031 | |
- task: | |
type: Classification | |
dataset: | |
type: mteb/mtop_intent | |
name: MTEB MTOPIntentClassification (en) | |
config: en | |
split: test | |
revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba | |
metrics: | |
- type: accuracy | |
value: 84.76744186046513 | |
- type: f1 | |
value: 70.3328215706764 | |
- task: | |
type: Classification | |
dataset: | |
type: mteb/amazon_massive_intent | |
name: MTEB MassiveIntentClassification (en) | |
config: en | |
split: test | |
revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 | |
metrics: | |
- type: accuracy | |
value: 79.29051782111635 | |
- type: f1 | |
value: 77.0837414890434 | |
- task: | |
type: Classification | |
dataset: | |
type: mteb/amazon_massive_scenario | |
name: MTEB MassiveScenarioClassification (en) | |
config: en | |
split: test | |
revision: 7d571f92784cd94a019292a1f45445077d0ef634 | |
metrics: | |
- type: accuracy | |
value: 81.64425016812373 | |
- type: f1 | |
value: 81.36288379329044 | |
- task: | |
type: Clustering | |
dataset: | |
type: mteb/medrxiv-clustering-p2p | |
name: MTEB MedrxivClusteringP2P | |
config: default | |
split: test | |
revision: e7a26af6f3ae46b30dde8737f02c07b1505bcc73 | |
metrics: | |
- type: v_measure | |
value: 31.0673311773222 | |
- task: | |
type: Clustering | |
dataset: | |
type: mteb/medrxiv-clustering-s2s | |
name: MTEB MedrxivClusteringS2S | |
config: default | |
split: test | |
revision: 35191c8c0dca72d8ff3efcd72aa802307d469663 | |
metrics: | |
- type: v_measure | |
value: 31.266850505047234 | |
- task: | |
type: Reranking | |
dataset: | |
type: mteb/mind_small | |
name: MTEB MindSmallReranking | |
config: default | |
split: test | |
revision: 3bdac13927fdc888b903db93b2ffdbd90b295a69 | |
metrics: | |
- type: map | |
value: 31.49575275757744 | |
- type: mrr | |
value: 32.64979714009148 | |
- task: | |
type: Retrieval | |
dataset: | |
type: nfcorpus | |
name: MTEB NFCorpus | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 6.151 | |
- type: map_at_10 | |
value: 14.879999999999999 | |
- type: map_at_100 | |
value: 19.445999999999998 | |
- type: map_at_1000 | |
value: 21.101 | |
- type: map_at_3 | |
value: 10.613999999999999 | |
- type: map_at_5 | |
value: 12.709000000000001 | |
- type: mrr_at_1 | |
value: 51.393 | |
- type: mrr_at_10 | |
value: 59.935 | |
- type: mrr_at_100 | |
value: 60.455000000000005 | |
- type: mrr_at_1000 | |
value: 60.485 | |
- type: mrr_at_3 | |
value: 57.894999999999996 | |
- type: mrr_at_5 | |
value: 59.303 | |
- type: ndcg_at_1 | |
value: 50.0 | |
- type: ndcg_at_10 | |
value: 39.324999999999996 | |
- type: ndcg_at_100 | |
value: 37.133 | |
- type: ndcg_at_1000 | |
value: 45.663 | |
- type: ndcg_at_3 | |
value: 45.294000000000004 | |
- type: ndcg_at_5 | |
value: 42.88 | |
- type: precision_at_1 | |
value: 51.393 | |
- type: precision_at_10 | |
value: 29.412 | |
- type: precision_at_100 | |
value: 9.666 | |
- type: precision_at_1000 | |
value: 2.263 | |
- type: precision_at_3 | |
value: 42.415000000000006 | |
- type: precision_at_5 | |
value: 37.399 | |
- type: recall_at_1 | |
value: 6.151 | |
- type: recall_at_10 | |
value: 19.121 | |
- type: recall_at_100 | |
value: 39.012 | |
- type: recall_at_1000 | |
value: 70.726 | |
- type: recall_at_3 | |
value: 11.855 | |
- type: recall_at_5 | |
value: 15.204 | |
- task: | |
type: Retrieval | |
dataset: | |
type: nq | |
name: MTEB NQ | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 36.382 | |
- type: map_at_10 | |
value: 53.657 | |
- type: map_at_100 | |
value: 54.547999999999995 | |
- type: map_at_1000 | |
value: 54.562999999999995 | |
- type: map_at_3 | |
value: 49.236999999999995 | |
- type: map_at_5 | |
value: 51.949 | |
- type: mrr_at_1 | |
value: 41.309000000000005 | |
- type: mrr_at_10 | |
value: 56.25599999999999 | |
- type: mrr_at_100 | |
value: 56.855999999999995 | |
- type: mrr_at_1000 | |
value: 56.867000000000004 | |
- type: mrr_at_3 | |
value: 52.891999999999996 | |
- type: mrr_at_5 | |
value: 54.99699999999999 | |
- type: ndcg_at_1 | |
value: 41.28 | |
- type: ndcg_at_10 | |
value: 61.702999999999996 | |
- type: ndcg_at_100 | |
value: 65.092 | |
- type: ndcg_at_1000 | |
value: 65.392 | |
- type: ndcg_at_3 | |
value: 53.722 | |
- type: ndcg_at_5 | |
value: 58.11300000000001 | |
- type: precision_at_1 | |
value: 41.28 | |
- type: precision_at_10 | |
value: 10.014000000000001 | |
- type: precision_at_100 | |
value: 1.187 | |
- type: precision_at_1000 | |
value: 0.121 | |
- type: precision_at_3 | |
value: 24.614 | |
- type: precision_at_5 | |
value: 17.317 | |
- type: recall_at_1 | |
value: 36.382 | |
- type: recall_at_10 | |
value: 83.38600000000001 | |
- type: recall_at_100 | |
value: 97.528 | |
- type: recall_at_1000 | |
value: 99.696 | |
- type: recall_at_3 | |
value: 63.053000000000004 | |
- type: recall_at_5 | |
value: 73.16 | |
- task: | |
type: Retrieval | |
dataset: | |
type: quora | |
name: MTEB QuoraRetrieval | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 69.577 | |
- type: map_at_10 | |
value: 83.944 | |
- type: map_at_100 | |
value: 84.604 | |
- type: map_at_1000 | |
value: 84.61800000000001 | |
- type: map_at_3 | |
value: 80.93599999999999 | |
- type: map_at_5 | |
value: 82.812 | |
- type: mrr_at_1 | |
value: 80.4 | |
- type: mrr_at_10 | |
value: 86.734 | |
- type: mrr_at_100 | |
value: 86.851 | |
- type: mrr_at_1000 | |
value: 86.85199999999999 | |
- type: mrr_at_3 | |
value: 85.75500000000001 | |
- type: mrr_at_5 | |
value: 86.396 | |
- type: ndcg_at_1 | |
value: 80.43 | |
- type: ndcg_at_10 | |
value: 87.75 | |
- type: ndcg_at_100 | |
value: 88.999 | |
- type: ndcg_at_1000 | |
value: 89.092 | |
- type: ndcg_at_3 | |
value: 84.88 | |
- type: ndcg_at_5 | |
value: 86.416 | |
- type: precision_at_1 | |
value: 80.43 | |
- type: precision_at_10 | |
value: 13.453000000000001 | |
- type: precision_at_100 | |
value: 1.539 | |
- type: precision_at_1000 | |
value: 0.157 | |
- type: precision_at_3 | |
value: 37.403 | |
- type: precision_at_5 | |
value: 24.648 | |
- type: recall_at_1 | |
value: 69.577 | |
- type: recall_at_10 | |
value: 95.233 | |
- type: recall_at_100 | |
value: 99.531 | |
- type: recall_at_1000 | |
value: 99.984 | |
- type: recall_at_3 | |
value: 86.867 | |
- type: recall_at_5 | |
value: 91.254 | |
- task: | |
type: Clustering | |
dataset: | |
type: mteb/reddit-clustering | |
name: MTEB RedditClustering | |
config: default | |
split: test | |
revision: 24640382cdbf8abc73003fb0fa6d111a705499eb | |
metrics: | |
- type: v_measure | |
value: 60.23690763558931 | |
- task: | |
type: Clustering | |
dataset: | |
type: mteb/reddit-clustering-p2p | |
name: MTEB RedditClusteringP2P | |
config: default | |
split: test | |
revision: 282350215ef01743dc01b456c7f5241fa8937f16 | |
metrics: | |
- type: v_measure | |
value: 64.12391112159126 | |
- task: | |
type: Retrieval | |
dataset: | |
type: scidocs | |
name: MTEB SCIDOCS | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 5.288 | |
- type: map_at_10 | |
value: 13.611999999999998 | |
- type: map_at_100 | |
value: 15.909 | |
- type: map_at_1000 | |
value: 16.235 | |
- type: map_at_3 | |
value: 9.644 | |
- type: map_at_5 | |
value: 11.559 | |
- type: mrr_at_1 | |
value: 26.1 | |
- type: mrr_at_10 | |
value: 37.571 | |
- type: mrr_at_100 | |
value: 38.72 | |
- type: mrr_at_1000 | |
value: 38.76 | |
- type: mrr_at_3 | |
value: 34.383 | |
- type: mrr_at_5 | |
value: 36.187999999999995 | |
- type: ndcg_at_1 | |
value: 26.1 | |
- type: ndcg_at_10 | |
value: 22.497 | |
- type: ndcg_at_100 | |
value: 31.098 | |
- type: ndcg_at_1000 | |
value: 36.434 | |
- type: ndcg_at_3 | |
value: 21.401 | |
- type: ndcg_at_5 | |
value: 18.66 | |
- type: precision_at_1 | |
value: 26.1 | |
- type: precision_at_10 | |
value: 11.67 | |
- type: precision_at_100 | |
value: 2.405 | |
- type: precision_at_1000 | |
value: 0.368 | |
- type: precision_at_3 | |
value: 20.0 | |
- type: precision_at_5 | |
value: 16.34 | |
- type: recall_at_1 | |
value: 5.288 | |
- type: recall_at_10 | |
value: 23.652 | |
- type: recall_at_100 | |
value: 48.79 | |
- type: recall_at_1000 | |
value: 74.703 | |
- type: recall_at_3 | |
value: 12.158 | |
- type: recall_at_5 | |
value: 16.582 | |
- task: | |
type: STS | |
dataset: | |
type: mteb/sickr-sts | |
name: MTEB SICK-R | |
config: default | |
split: test | |
revision: a6ea5a8cab320b040a23452cc28066d9beae2cee | |
metrics: | |
- type: cos_sim_spearman | |
value: 83.6969699802343 | |
- task: | |
type: STS | |
dataset: | |
type: mteb/sts12-sts | |
name: MTEB STS12 | |
config: default | |
split: test | |
revision: a0d554a64d88156834ff5ae9920b964011b16384 | |
metrics: | |
- type: cos_sim_spearman | |
value: 78.8031221769135 | |
- task: | |
type: STS | |
dataset: | |
type: mteb/sts13-sts | |
name: MTEB STS13 | |
config: default | |
split: test | |
revision: 7e90230a92c190f1bf69ae9002b8cea547a64cca | |
metrics: | |
- type: cos_sim_spearman | |
value: 86.37435789895171 | |
- task: | |
type: STS | |
dataset: | |
type: mteb/sts14-sts | |
name: MTEB STS14 | |
config: default | |
split: test | |
revision: 6031580fec1f6af667f0bd2da0a551cf4f0b2375 | |
metrics: | |
- type: cos_sim_spearman | |
value: 84.04036612478626 | |
- task: | |
type: STS | |
dataset: | |
type: mteb/sts15-sts | |
name: MTEB STS15 | |
config: default | |
split: test | |
revision: ae752c7c21bf194d8b67fd573edf7ae58183cbe3 | |
metrics: | |
- type: cos_sim_spearman | |
value: 88.99055778929946 | |
- task: | |
type: STS | |
dataset: | |
type: mteb/sts16-sts | |
name: MTEB STS16 | |
config: default | |
split: test | |
revision: 4d8694f8f0e0100860b497b999b3dbed754a0513 | |
metrics: | |
- type: cos_sim_spearman | |
value: 87.22140434759893 | |
- task: | |
type: STS | |
dataset: | |
type: mteb/sts17-crosslingual-sts | |
name: MTEB STS17 (en-en) | |
config: en-en | |
split: test | |
revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d | |
metrics: | |
- type: cos_sim_spearman | |
value: 90.1862731405498 | |
- task: | |
type: STS | |
dataset: | |
type: mteb/sts22-crosslingual-sts | |
name: MTEB STS22 (en) | |
config: en | |
split: test | |
revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 | |
metrics: | |
- type: cos_sim_spearman | |
value: 67.67995229420237 | |
- task: | |
type: STS | |
dataset: | |
type: mteb/stsbenchmark-sts | |
name: MTEB STSBenchmark | |
config: default | |
split: test | |
revision: b0fddb56ed78048fa8b90373c8a3cfc37b684831 | |
metrics: | |
- type: cos_sim_spearman | |
value: 88.65370934976113 | |
- task: | |
type: Reranking | |
dataset: | |
type: mteb/scidocs-reranking | |
name: MTEB SciDocsRR | |
config: default | |
split: test | |
revision: d3c5e1fc0b855ab6097bf1cda04dd73947d7caab | |
metrics: | |
- type: map | |
value: 83.79832393152147 | |
- type: mrr | |
value: 95.78404438698557 | |
- task: | |
type: Retrieval | |
dataset: | |
type: scifact | |
name: MTEB SciFact | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 64.883 | |
- type: map_at_10 | |
value: 74.48 | |
- type: map_at_100 | |
value: 74.85000000000001 | |
- type: map_at_1000 | |
value: 74.861 | |
- type: map_at_3 | |
value: 71.596 | |
- type: map_at_5 | |
value: 73.545 | |
- type: mrr_at_1 | |
value: 67.667 | |
- type: mrr_at_10 | |
value: 75.394 | |
- type: mrr_at_100 | |
value: 75.644 | |
- type: mrr_at_1000 | |
value: 75.655 | |
- type: mrr_at_3 | |
value: 73.5 | |
- type: mrr_at_5 | |
value: 74.63300000000001 | |
- type: ndcg_at_1 | |
value: 67.667 | |
- type: ndcg_at_10 | |
value: 78.855 | |
- type: ndcg_at_100 | |
value: 80.361 | |
- type: ndcg_at_1000 | |
value: 80.624 | |
- type: ndcg_at_3 | |
value: 74.37899999999999 | |
- type: ndcg_at_5 | |
value: 76.89200000000001 | |
- type: precision_at_1 | |
value: 67.667 | |
- type: precision_at_10 | |
value: 10.267 | |
- type: precision_at_100 | |
value: 1.11 | |
- type: precision_at_1000 | |
value: 0.11299999999999999 | |
- type: precision_at_3 | |
value: 28.778 | |
- type: precision_at_5 | |
value: 19.133 | |
- type: recall_at_1 | |
value: 64.883 | |
- type: recall_at_10 | |
value: 91.2 | |
- type: recall_at_100 | |
value: 98.0 | |
- type: recall_at_1000 | |
value: 100.0 | |
- type: recall_at_3 | |
value: 79.406 | |
- type: recall_at_5 | |
value: 85.578 | |
- task: | |
type: PairClassification | |
dataset: | |
type: mteb/sprintduplicatequestions-pairclassification | |
name: MTEB SprintDuplicateQuestions | |
config: default | |
split: test | |
revision: d66bd1f72af766a5cc4b0ca5e00c162f89e8cc46 | |
metrics: | |
- type: cos_sim_accuracy | |
value: 99.85445544554456 | |
- type: cos_sim_ap | |
value: 96.81785428870712 | |
- type: cos_sim_f1 | |
value: 92.67563527653213 | |
- type: cos_sim_precision | |
value: 92.35352532274081 | |
- type: cos_sim_recall | |
value: 93.0 | |
- type: dot_accuracy | |
value: 99.75643564356436 | |
- type: dot_ap | |
value: 94.46746929160422 | |
- type: dot_f1 | |
value: 87.74900398406375 | |
- type: dot_precision | |
value: 87.40079365079364 | |
- type: dot_recall | |
value: 88.1 | |
- type: euclidean_accuracy | |
value: 99.85445544554456 | |
- type: euclidean_ap | |
value: 96.59180137299155 | |
- type: euclidean_f1 | |
value: 92.48850281042411 | |
- type: euclidean_precision | |
value: 94.56635318704284 | |
- type: euclidean_recall | |
value: 90.5 | |
- type: manhattan_accuracy | |
value: 99.85643564356435 | |
- type: manhattan_ap | |
value: 96.66599616275849 | |
- type: manhattan_f1 | |
value: 92.69746646795828 | |
- type: manhattan_precision | |
value: 92.10266535044423 | |
- type: manhattan_recall | |
value: 93.30000000000001 | |
- type: max_accuracy | |
value: 99.85643564356435 | |
- type: max_ap | |
value: 96.81785428870712 | |
- type: max_f1 | |
value: 92.69746646795828 | |
- task: | |
type: Clustering | |
dataset: | |
type: mteb/stackexchange-clustering | |
name: MTEB StackExchangeClustering | |
config: default | |
split: test | |
revision: 6cbc1f7b2bc0622f2e39d2c77fa502909748c259 | |
metrics: | |
- type: v_measure | |
value: 70.72970157362414 | |
- task: | |
type: Clustering | |
dataset: | |
type: mteb/stackexchange-clustering-p2p | |
name: MTEB StackExchangeClusteringP2P | |
config: default | |
split: test | |
revision: 815ca46b2622cec33ccafc3735d572c266efdb44 | |
metrics: | |
- type: v_measure | |
value: 34.49706344517027 | |
- task: | |
type: Reranking | |
dataset: | |
type: mteb/stackoverflowdupquestions-reranking | |
name: MTEB StackOverflowDupQuestions | |
config: default | |
split: test | |
revision: e185fbe320c72810689fc5848eb6114e1ef5ec69 | |
metrics: | |
- type: map | |
value: 54.41010678297881 | |
- type: mrr | |
value: 55.15095811051693 | |
- task: | |
type: Summarization | |
dataset: | |
type: mteb/summeval | |
name: MTEB SummEval | |
config: default | |
split: test | |
revision: cda12ad7615edc362dbf25a00fdd61d3b1eaf93c | |
metrics: | |
- type: cos_sim_pearson | |
value: 30.5030094989814 | |
- type: cos_sim_spearman | |
value: 29.959138274084797 | |
- type: dot_pearson | |
value: 29.740134155639076 | |
- type: dot_spearman | |
value: 29.18174652067779 | |
- task: | |
type: Retrieval | |
dataset: | |
type: trec-covid | |
name: MTEB TRECCOVID | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 0.22200000000000003 | |
- type: map_at_10 | |
value: 1.925 | |
- type: map_at_100 | |
value: 13.150999999999998 | |
- type: map_at_1000 | |
value: 33.410000000000004 | |
- type: map_at_3 | |
value: 0.631 | |
- type: map_at_5 | |
value: 0.9990000000000001 | |
- type: mrr_at_1 | |
value: 82.0 | |
- type: mrr_at_10 | |
value: 90.0 | |
- type: mrr_at_100 | |
value: 90.0 | |
- type: mrr_at_1000 | |
value: 90.0 | |
- type: mrr_at_3 | |
value: 89.0 | |
- type: mrr_at_5 | |
value: 90.0 | |
- type: ndcg_at_1 | |
value: 79.0 | |
- type: ndcg_at_10 | |
value: 77.69200000000001 | |
- type: ndcg_at_100 | |
value: 64.89 | |
- type: ndcg_at_1000 | |
value: 59.748999999999995 | |
- type: ndcg_at_3 | |
value: 79.296 | |
- type: ndcg_at_5 | |
value: 78.63 | |
- type: precision_at_1 | |
value: 82.0 | |
- type: precision_at_10 | |
value: 82.19999999999999 | |
- type: precision_at_100 | |
value: 67.52 | |
- type: precision_at_1000 | |
value: 26.512 | |
- type: precision_at_3 | |
value: 83.333 | |
- type: precision_at_5 | |
value: 83.2 | |
- type: recall_at_1 | |
value: 0.22200000000000003 | |
- type: recall_at_10 | |
value: 2.164 | |
- type: recall_at_100 | |
value: 16.608 | |
- type: recall_at_1000 | |
value: 56.89999999999999 | |
- type: recall_at_3 | |
value: 0.658 | |
- type: recall_at_5 | |
value: 1.084 | |
- task: | |
type: Retrieval | |
dataset: | |
type: webis-touche2020 | |
name: MTEB Touche2020 | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 1.8519999999999999 | |
- type: map_at_10 | |
value: 8.569 | |
- type: map_at_100 | |
value: 14.238999999999999 | |
- type: map_at_1000 | |
value: 15.876000000000001 | |
- type: map_at_3 | |
value: 3.9859999999999998 | |
- type: map_at_5 | |
value: 5.785 | |
- type: mrr_at_1 | |
value: 26.531 | |
- type: mrr_at_10 | |
value: 40.581 | |
- type: mrr_at_100 | |
value: 41.379 | |
- type: mrr_at_1000 | |
value: 41.388999999999996 | |
- type: mrr_at_3 | |
value: 35.034 | |
- type: mrr_at_5 | |
value: 38.299 | |
- type: ndcg_at_1 | |
value: 25.509999999999998 | |
- type: ndcg_at_10 | |
value: 22.18 | |
- type: ndcg_at_100 | |
value: 34.695 | |
- type: ndcg_at_1000 | |
value: 46.854 | |
- type: ndcg_at_3 | |
value: 23.112 | |
- type: ndcg_at_5 | |
value: 23.089000000000002 | |
- type: precision_at_1 | |
value: 26.531 | |
- type: precision_at_10 | |
value: 20.408 | |
- type: precision_at_100 | |
value: 7.428999999999999 | |
- type: precision_at_1000 | |
value: 1.559 | |
- type: precision_at_3 | |
value: 23.810000000000002 | |
- type: precision_at_5 | |
value: 23.265 | |
- type: recall_at_1 | |
value: 1.8519999999999999 | |
- type: recall_at_10 | |
value: 15.038000000000002 | |
- type: recall_at_100 | |
value: 46.499 | |
- type: recall_at_1000 | |
value: 84.11800000000001 | |
- type: recall_at_3 | |
value: 5.179 | |
- type: recall_at_5 | |
value: 8.758000000000001 | |
- task: | |
type: Classification | |
dataset: | |
type: mteb/toxic_conversations_50k | |
name: MTEB ToxicConversationsClassification | |
config: default | |
split: test | |
revision: d7c0de2777da35d6aae2200a62c6e0e5af397c4c | |
metrics: | |
- type: accuracy | |
value: 69.26140000000001 | |
- type: ap | |
value: 14.138284541193421 | |
- type: f1 | |
value: 53.715363590501916 | |
- task: | |
type: Classification | |
dataset: | |
type: mteb/tweet_sentiment_extraction | |
name: MTEB TweetSentimentExtractionClassification | |
config: default | |
split: test | |
revision: d604517c81ca91fe16a244d1248fc021f9ecee7a | |
metrics: | |
- type: accuracy | |
value: 62.136389360498015 | |
- type: f1 | |
value: 62.33290824449911 | |
- task: | |
type: Clustering | |
dataset: | |
type: mteb/twentynewsgroups-clustering | |
name: MTEB TwentyNewsgroupsClustering | |
config: default | |
split: test | |
revision: 6125ec4e24fa026cec8a478383ee943acfbd5449 | |
metrics: | |
- type: v_measure | |
value: 52.18306009684791 | |
- task: | |
type: PairClassification | |
dataset: | |
type: mteb/twittersemeval2015-pairclassification | |
name: MTEB TwitterSemEval2015 | |
config: default | |
split: test | |
revision: 70970daeab8776df92f5ea462b6173c0b46fd2d1 | |
metrics: | |
- type: cos_sim_accuracy | |
value: 88.27561542588067 | |
- type: cos_sim_ap | |
value: 80.59558041410928 | |
- type: cos_sim_f1 | |
value: 73.54724608388075 | |
- type: cos_sim_precision | |
value: 70.55259331071255 | |
- type: cos_sim_recall | |
value: 76.80738786279684 | |
- type: dot_accuracy | |
value: 85.00923883888657 | |
- type: dot_ap | |
value: 71.76942851966301 | |
- type: dot_f1 | |
value: 66.84518013631937 | |
- type: dot_precision | |
value: 62.042476276547674 | |
- type: dot_recall | |
value: 72.45382585751979 | |
- type: euclidean_accuracy | |
value: 88.26965488466352 | |
- type: euclidean_ap | |
value: 80.44398056118867 | |
- type: euclidean_f1 | |
value: 73.28244274809161 | |
- type: euclidean_precision | |
value: 68.69806094182826 | |
- type: euclidean_recall | |
value: 78.52242744063325 | |
- type: manhattan_accuracy | |
value: 88.25773380222924 | |
- type: manhattan_ap | |
value: 80.25000483445007 | |
- type: manhattan_f1 | |
value: 73.10447023956533 | |
- type: manhattan_precision | |
value: 68.70937790157846 | |
- type: manhattan_recall | |
value: 78.10026385224275 | |
- type: max_accuracy | |
value: 88.27561542588067 | |
- type: max_ap | |
value: 80.59558041410928 | |
- type: max_f1 | |
value: 73.54724608388075 | |
- task: | |
type: PairClassification | |
dataset: | |
type: mteb/twitterurlcorpus-pairclassification | |
name: MTEB TwitterURLCorpus | |
config: default | |
split: test | |
revision: 8b6510b0b1fa4e4c4f879467980e9be563ec1cdf | |
metrics: | |
- type: cos_sim_accuracy | |
value: 89.52536189700004 | |
- type: cos_sim_ap | |
value: 86.55972191277392 | |
- type: cos_sim_f1 | |
value: 79.31733569243245 | |
- type: cos_sim_precision | |
value: 76.08372816632487 | |
- type: cos_sim_recall | |
value: 82.83800431167231 | |
- type: dot_accuracy | |
value: 87.77506112469437 | |
- type: dot_ap | |
value: 82.92833178514168 | |
- type: dot_f1 | |
value: 76.12050479839702 | |
- type: dot_precision | |
value: 70.03687172520861 | |
- type: dot_recall | |
value: 83.3615645210964 | |
- type: euclidean_accuracy | |
value: 89.3643031784841 | |
- type: euclidean_ap | |
value: 86.45902920741383 | |
- type: euclidean_f1 | |
value: 79.4788514062154 | |
- type: euclidean_precision | |
value: 76.32922160782645 | |
- type: euclidean_recall | |
value: 82.89959963042809 | |
- type: manhattan_accuracy | |
value: 89.38564830985369 | |
- type: manhattan_ap | |
value: 86.47558438668958 | |
- type: manhattan_f1 | |
value: 79.46758328152997 | |
- type: manhattan_precision | |
value: 75.67379343965457 | |
- type: manhattan_recall | |
value: 83.66184170003079 | |
- type: max_accuracy | |
value: 89.52536189700004 | |
- type: max_ap | |
value: 86.55972191277392 | |
- type: max_f1 | |
value: 79.4788514062154 | |
# LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders | |
> LLM2Vec is a simple recipe to convert decoder-only LLMs into text encoders. It consists of 3 simple steps: 1) enabling bidirectional attention, 2) masked next token prediction, and 3) unsupervised contrastive learning. The model can be further fine-tuned to achieve state-of-the-art performance. | |
- **Repository:** https://github.com/McGill-NLP/llm2vec | |
- **Paper:** https://arxiv.org/abs/2404.05961 | |
## Installation | |
```bash | |
pip install llm2vec | |
``` | |
## Usage | |
```python | |
from llm2vec import LLM2Vec | |
import torch | |
from transformers import AutoTokenizer, AutoModel, AutoConfig | |
from peft import PeftModel | |
# Loading base Mistral model, along with custom code that enables bidirectional connections in decoder-only LLMs. MNTP LoRA weights are merged into the base model. | |
tokenizer = AutoTokenizer.from_pretrained( | |
"McGill-NLP/LLM2Vec-Mistral-7B-Instruct-v2-mntp" | |
) | |
config = AutoConfig.from_pretrained( | |
"McGill-NLP/LLM2Vec-Mistral-7B-Instruct-v2-mntp", trust_remote_code=True | |
) | |
model = AutoModel.from_pretrained( | |
"McGill-NLP/LLM2Vec-Mistral-7B-Instruct-v2-mntp", | |
trust_remote_code=True, | |
config=config, | |
torch_dtype=torch.bfloat16, | |
device_map="cuda" if torch.cuda.is_available() else "cpu", | |
) | |
model = PeftModel.from_pretrained( | |
model, | |
"McGill-NLP/LLM2Vec-Mistral-7B-Instruct-v2-mntp", | |
) | |
model = model.merge_and_unload() # This can take several minutes on cpu | |
# Loading supervised model. This loads the trained LoRA weights on top of MNTP model. Hence the final weights are -- Base model + MNTP (LoRA) + supervised (LoRA). | |
model = PeftModel.from_pretrained( | |
model, "McGill-NLP/LLM2Vec-Mistral-7B-Instruct-v2-mntp-supervised" | |
) | |
# Wrapper for encoding and pooling operations | |
l2v = LLM2Vec(model, tokenizer, pooling_mode="mean", max_length=512) | |
# Encoding queries using instructions | |
instruction = ( | |
"Given a web search query, retrieve relevant passages that answer the query:" | |
) | |
queries = [ | |
[instruction, "how much protein should a female eat"], | |
[instruction, "summit define"], | |
] | |
q_reps = l2v.encode(queries) | |
# Encoding documents. Instruction are not required for documents | |
documents = [ | |
"As a general guideline, the CDC's average requirement of protein for women ages 19 to 70 is 46 grams per day. But, as you can see from this chart, you'll need to increase that if you're expecting or training for a marathon. Check out the chart below to see how much protein you should be eating each day.", | |
"Definition of summit for English Language Learners. : 1 the highest point of a mountain : the top of a mountain. : 2 the highest level. : 3 a meeting or series of meetings between the leaders of two or more governments.", | |
] | |
d_reps = l2v.encode(documents) | |
# Compute cosine similarity | |
q_reps_norm = torch.nn.functional.normalize(q_reps, p=2, dim=1) | |
d_reps_norm = torch.nn.functional.normalize(d_reps, p=2, dim=1) | |
cos_sim = torch.mm(q_reps_norm, d_reps_norm.transpose(0, 1)) | |
print(cos_sim) | |
""" | |
tensor([[0.5485, 0.0551], | |
[0.0565, 0.5425]]) | |
""" | |
``` | |
## Questions | |
If you have any question about the code, feel free to email Parishad (`[email protected]`) and Vaibhav (`[email protected]`). |