all-mpnet-base-v2-pair_score

This is a sentence-transformers model finetuned from sentence-transformers/all-mpnet-base-v2. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: sentence-transformers/all-mpnet-base-v2
  • Maximum Sequence Length: 384 tokens
  • Output Dimensionality: 768 tokens
  • Similarity Function: Cosine Similarity
  • Language: en
  • License: apache-2.0

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 384, 'do_lower_case': False}) with Transformer model: MPNetModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    'jeremy hush book',
    'chinese jumper',
    'perfume',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 128
  • per_device_eval_batch_size: 128
  • learning_rate: 2e-05
  • num_train_epochs: 2
  • warmup_ratio: 0.1
  • fp16: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 128
  • per_device_eval_batch_size: 128
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 2
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Click to expand
Epoch Step Training Loss loss
0.0094 100 16.0755 -
0.0188 200 13.0643 -
0.0282 300 9.3474 -
0.0376 400 8.2606 -
0.0469 500 8.084 -
0.0563 600 8.0581 -
0.0657 700 8.0175 -
0.0751 800 8.0285 -
0.0845 900 8.0024 -
0.0939 1000 8.0161 -
0.1033 1100 7.9941 -
0.1127 1200 8.0233 -
0.1221 1300 8.0141 -
0.1314 1400 7.9644 -
0.1408 1500 8.0311 -
0.1502 1600 8.0306 -
0.1596 1700 7.989 -
0.1690 1800 8.0034 -
0.1784 1900 8.0107 -
0.1878 2000 7.9737 -
0.1972 2100 7.9827 -
0.2066 2200 8.0389 -
0.2159 2300 7.973 -
0.2253 2400 7.9669 -
0.2347 2500 8.0296 -
0.2441 2600 7.9984 -
0.2535 2700 7.9772 -
0.2629 2800 7.9838 -
0.2723 2900 7.9816 -
0.2817 3000 8.0021 -
0.2911 3100 7.9715 -
0.3004 3200 7.9809 -
0.3098 3300 7.9849 -
0.3192 3400 7.9463 -
0.3286 3500 8.0067 -
0.3380 3600 7.9431 -
0.3474 3700 7.9877 -
0.3568 3800 7.9494 -
0.3662 3900 7.9466 -
0.3756 4000 7.9708 -
0.3849 4100 7.9525 -
0.3943 4200 7.9322 -
0.4037 4300 7.9415 -
0.4131 4400 7.9932 -
0.4225 4500 7.9481 -
0.4319 4600 7.976 -
0.4413 4700 7.971 -
0.4507 4800 7.9647 -
0.4601 4900 7.9217 -
0.4694 5000 7.9374 7.9518
0.4788 5100 7.9026 -
0.4882 5200 7.9304 -
0.4976 5300 7.9148 -
0.5070 5400 7.9538 -
0.5164 5500 8.0002 -
0.5258 5600 7.9571 -
0.5352 5700 7.932 -
0.5445 5800 7.9047 -
0.5539 5900 7.9353 -
0.5633 6000 7.9203 -
0.5727 6100 7.8967 -
0.5821 6200 7.9414 -
0.5915 6300 7.9631 -
0.6009 6400 7.9606 -
0.6103 6500 7.9377 -
0.6197 6600 7.9108 -
0.6290 6700 7.9225 -
0.6384 6800 7.9154 -
0.6478 6900 7.9191 -
0.6572 7000 7.8903 -
0.6666 7100 7.9213 -
0.6760 7200 7.9202 -
0.6854 7300 7.8998 -
0.6948 7400 7.9153 -
0.7042 7500 7.9037 -
0.7135 7600 7.9146 -
0.7229 7700 7.8972 -
0.7323 7800 7.9374 -
0.7417 7900 7.8647 -
0.7511 8000 7.8915 -
0.7605 8100 7.8846 -
0.7699 8200 7.8988 -
0.7793 8300 7.8702 -
0.7887 8400 7.923 -
0.7980 8500 7.891 -
0.8074 8600 7.8832 -
0.8168 8700 7.8726 -
0.8262 8800 7.8813 -
0.8356 8900 7.8986 -
0.8450 9000 7.8743 -
0.8544 9100 7.8791 -
0.8638 9200 7.8783 -
0.8732 9300 7.8528 -
0.8825 9400 7.8864 -
0.8919 9500 7.8989 -
0.9013 9600 7.8617 -
0.9107 9700 7.8371 -
0.9201 9800 7.8566 -
0.9295 9900 7.8776 -
0.9389 10000 7.8558 7.8492
0.9483 10100 7.848 -
0.9577 10200 7.8227 -
0.9670 10300 7.8311 -
0.9764 10400 7.8437 -
0.9858 10500 7.8454 -
0.9952 10600 7.8362 -
1.0046 10700 7.8681 -
1.0140 10800 7.8745 -
1.0234 10900 7.8339 -
1.0328 11000 7.8458 -
1.0422 11100 7.8493 -
1.0515 11200 7.8317 -
1.0609 11300 7.841 -
1.0703 11400 7.8292 -
1.0797 11500 7.8121 -
1.0891 11600 7.8165 -
1.0985 11700 7.8259 -
1.1079 11800 7.8303 -
1.1173 11900 7.809 -
1.1267 12000 7.818 -
1.1360 12100 7.8071 -
1.1454 12200 7.801 -
1.1548 12300 7.8123 -
1.1642 12400 7.8203 -
1.1736 12500 7.8609 -
1.1830 12600 7.7782 -
1.1924 12700 7.8092 -
1.2018 12800 7.815 -
1.2112 12900 7.8196 -
1.2205 13000 7.8206 -
1.2299 13100 7.8022 -
1.2393 13200 7.8043 -
1.2487 13300 7.7823 -
1.2581 13400 7.8061 -
1.2675 13500 7.8016 -
1.2769 13600 7.8076 -
1.2863 13700 7.7996 -
1.2957 13800 7.8035 -
1.3050 13900 7.8092 -
1.3144 14000 7.7902 -
1.3238 14100 7.8114 -
1.3332 14200 7.8112 -
1.3426 14300 7.8036 -
1.3520 14400 7.8178 -
1.3614 14500 7.8391 -
1.3708 14600 7.8151 -
1.3802 14700 7.7957 -
1.3895 14800 7.7833 -
1.3989 14900 7.8049 -
1.4083 15000 7.8163 7.8078
1.4177 15100 7.7864 -
1.4271 15200 7.8241 -
1.4365 15300 7.7694 -
1.4459 15400 7.7784 -
1.4553 15500 7.7628 -
1.4647 15600 7.8044 -
1.4740 15700 7.7871 -
1.4834 15800 7.809 -
1.4928 15900 7.7955 -
1.5022 16000 7.8056 -
1.5116 16100 7.774 -
1.5210 16200 7.7874 -
1.5304 16300 7.7918 -
1.5398 16400 7.7787 -
1.5492 16500 7.7881 -
1.5585 16600 7.7723 -
1.5679 16700 7.7809 -
1.5773 16800 7.8096 -
1.5867 16900 7.7559 -
1.5961 17000 7.8063 -
1.6055 17100 7.8137 -
1.6149 17200 7.761 -
1.6243 17300 7.7672 -
1.6336 17400 7.7939 -
1.6430 17500 7.8052 -
1.6524 17600 7.7519 -
1.6618 17700 7.7643 -
1.6712 17800 7.7823 -
1.6806 17900 7.7507 -
1.6900 18000 7.777 -
1.6994 18100 7.786 -
1.7088 18200 7.8097 -
1.7181 18300 7.7749 -
1.7275 18400 7.7626 -
1.7369 18500 7.7783 -
1.7463 18600 7.7552 -
1.7557 18700 7.7837 -
1.7651 18800 7.7583 -
1.7745 18900 7.7617 -
1.7839 19000 7.7649 -
1.7933 19100 7.7767 -
1.8026 19200 7.7565 -
1.8120 19300 7.7702 -
1.8214 19400 7.7552 -
1.8308 19500 7.7511 -
1.8402 19600 7.7818 -
1.8496 19700 7.7704 -
1.8590 19800 7.7824 -
1.8684 19900 7.751 -
1.8778 20000 7.7868 7.7942
1.8871 20100 7.7981 -
1.8965 20200 7.7673 -
1.9059 20300 7.7695 -
1.9153 20400 7.7587 -
1.9247 20500 7.7444 -
1.9341 20600 7.7736 -
1.9435 20700 7.7655 -
1.9529 20800 7.7686 -
1.9623 20900 7.7731 -
1.9716 21000 7.7527 -
1.9810 21100 7.7962 -
1.9904 21200 7.7676 -
1.9998 21300 7.7641 -

Framework Versions

  • Python: 3.8.10
  • Sentence Transformers: 3.1.1
  • Transformers: 4.45.2
  • PyTorch: 2.4.1+cu118
  • Accelerate: 1.0.1
  • Datasets: 3.0.1
  • Tokenizers: 0.20.3

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

CoSENTLoss

@online{kexuefm-8847,
    title={CoSENT: A more efficient sentence vector scheme than Sentence-BERT},
    author={Su Jianlin},
    year={2022},
    month={Jan},
    url={https://kexue.fm/archives/8847},
}
Downloads last month
0
Safetensors
Model size
109M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for youssefkhalil320/all-mpnet-base-v2-pairscore

Finetuned
(208)
this model