MPNet base trained on AllNLI triplets

This is a sentence-transformers model finetuned from microsoft/mpnet-base. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: microsoft/mpnet-base
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity
  • Language: en
  • License: apache-2.0

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: MPNetModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("Jrinky/mpnet-base-all-nli-triplet")
# Run inference
sentences = [
    'What challenges do university researchers face when trying to turn their discoveries into commercial products',
    'Universities are vital to the process of innovation and advancement: they educate students who bring new ways of thinking to old problems, and they make new discoveries that no one else would make because no one else has the opportunity to delve so deeply. In creating this type of refuge, we also create a comfort zone. Because governmental support for science and technology is designed to support long-term, high-risk work regardless of immediate return, ROI is not a factor in getting government funding. University researchers become successful at pitching research ideas without serious reference to commercial outcome. Peer review – which is critical for the success of science – further reinforces this tendency. University researchers are rewarded for thinking in this very specific way, and this creates the comfort zone. As it dawns on a researcher that they may need to work with a company or an entrepreneur to see their discoveries become products or services that can benefit society, they may find themselves a victim of their own past success. Many researchers reflexively approach companies as if they are yet another type of funding agency, but since companies are not in the grant-making business, a partnership fails to materialize. This basic failure to communicate means valuable commercial opportunities are often not recognized, or when they are, the resulting partnership does not go well.',
    'A major shakeup has taken place at the top of the Boston Celtics. Danny Ainge has stepped down as president of basketball operations, and head coach Brad Stevens has stepped into the role. Stevens will now lead the search for a new coach. The team made the announcement early Wednesday, one day after the Celtics were eliminated by the Brooklyn Nets in the first round of the Eastern Conference playoffs. “Helping guide this organization has been the thrill of a lifetime, and having worked side-by-side with him since he’s been here, I know we couldn’t be in better hands than with Brad guiding the team going forward,” Ainge said in a statement. “I’m grateful to ownership, all of my Celtics colleagues, and the best fans in basketball for being part of the journey.”\nAinge, 62, is a franchise legend.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

  • Size: 6,433 training samples
  • Columns: anchor and positive
  • Approximate statistics based on the first 1000 samples:
    anchor positive
    type string string
    details
    • min: 6 tokens
    • mean: 16.21 tokens
    • max: 42 tokens
    • min: 5 tokens
    • mean: 140.69 tokens
    • max: 512 tokens
  • Samples:
    anchor positive
    What type of event is being described by Pierre LeBrun in relation to the NHL ESPN’s Pierre LeBrun said, “It's not just about one NHL game anymore. It's a week-long event.
    Who designed the property's landscape and when was the building listed on the National Register of Historic Places The property's landscape continues a circular theme, with flower beds, fencing, and parking arranged in concentric patterns around the structure. It was designed by the Washington, DC firm of Deigert & Yerkes. The building was listed on the National Register of Historic Places in 2017.
    Is 'ladens' a valid word to use in Scrabble and other word games Scrabble?! LADENSIs ladens valid for Scrabble? Words With Friends? Lexulous? WordFeud? Other games
  • Loss: selfloss.Infonce with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Evaluation Dataset

Unnamed Dataset

  • Size: 804 evaluation samples
  • Columns: anchor and positive
  • Approximate statistics based on the first 804 samples:
    anchor positive
    type string string
    details
    • min: 7 tokens
    • mean: 16.44 tokens
    • max: 38 tokens
    • min: 8 tokens
    • mean: 149.21 tokens
    • max: 512 tokens
  • Samples:
    anchor positive
    What types of special events can the salon services be booked for Our fabulous salon services are available at your special event! Whether it's a wedding, photo shoot, prom, or just a fun girls' night in- we do it all.
    What material is the Hudson Baby plush hooded robe made of Dimensions (Overall): 10 inches (L), 10 inches (H) x 1 inches (W)
    Weight: 1 pounds
    Textile Material: 100% Polyester
    • Animal face plush hooded bath robe. • Made with 100% plush coral fleece fabric
    • Soft and gentle on baby's skin
    • Optimal for everyday use
    • Affordable, high quality bath robe
    Hudson Baby plush hooded robe is made of super soft, cozy plush material to dry and warm baby after bath or pool time.
    Where is this uncommon species thought to occur It is also thought to occur in New Zealand. It is an uncommon species, growing in "heathy woodland [in] semi shade".
  • Loss: selfloss.Infonce with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 32
  • learning_rate: 2e-05
  • num_train_epochs: 6
  • warmup_ratio: 0.1
  • fp16: True
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 32
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 6
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss Validation Loss
0.9901 100 1.4311 0.2171
1.9802 200 0.237 0.1718
2.9703 300 0.1466 0.1561
3.9604 400 0.1084 0.1541
4.9505 500 0.0879 0.1528
5.9406 600 0.0794 0.1514

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.4.0
  • Transformers: 4.48.1
  • PyTorch: 2.5.1+cu124
  • Accelerate: 1.3.0
  • Datasets: 3.2.0
  • Tokenizers: 0.21.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

Infonce

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
6
Safetensors
Model size
109M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for Jrinky/model1

Finetuned
(61)
this model