SentenceTransformer based on sentence-transformers/all-mpnet-base-v2

This is a sentence-transformers model finetuned from sentence-transformers/all-mpnet-base-v2. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: sentence-transformers/all-mpnet-base-v2
  • Maximum Sequence Length: 256 tokens
  • Output Dimensionality: 768 tokens
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: MPNetModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("knguyennguyen/mpnet_laptop1k_adjustedv2")
# Run inference
sentences = [
    'laptop with a large display, integrated graphics, and multiple connectivity options, featuring a sleek design and lightweight build. intended for general use.',
    'Title: HP Envy 17t CG 17.3" Touch FHD Laptop (Intel i7-1195G7 4-Core, 32GB RAM, 1TB PCIe SSD + 2TB HDD, Intel Iris Xe, 1920x1080, Backlit KB, FP Reader, WiFi 6, Win11H) w/Hub Descripion: [\'GreatPriceTech sells computers with custom/upgraded configurations to enhance system performance. If the computer has modifications as listed above, the manufacturer’s box was opened by our highly skilled technicians for testing, inspection, and installation of the upgrades according to the specifications advertised. All computers and components are brand new.\'\n \'Processor: Intel Core i7-1195G7 2.80GHz Processor (11th Gen, upto 5 GHz, 12MB Cache, 4-Cores)\'\n \'Processor:\'\n \'Intel Core i7-1195G7 2.80GHz Processor (11th Gen, upto 5 GHz, 12MB Cache, 4-Cores)\'\n \'Storage: 1TB PCIe SSD (Solid State Drive) + 2TB HDD (Hard Disk Drive)\'\n \'Storage:\' \'1TB PCIe SSD (Solid State Drive) + 2TB HDD (Hard Disk Drive)\'\n \'Memory: 32GB DDR4 SO-DIMM\' \'Memory:\' \'32GB DDR4 SO-DIMM\'\n \'Graphics: Intel Iris Xe Integrated Graphics,\' \'Graphics:\'\n \'Intel Iris Xe Integrated Graphics,\'\n \'Operating System: Windows 11 Home-64\' \'Operating System:\'\n \'Windows 11 Home-64\' \'Connectivity: Wi-Fi 6 AX201 Wifi, Bluetooth 5.0,\'\n \'Connectivity:\' \'Wi-Fi 6 AX201 Wifi, Bluetooth 5.0,\'\n \'Camera: 720p HD Webcam\' \'Camera:\' \'720p HD Webcam\'\n \'Input/Output: ,, Backlit Keyboard,\' \'Input/Output:\'\n \',, Backlit Keyboard,\'\n \'Display: 17.3" Full HD (1920x1080) 60Hz 16:9 Display\' \'Display:\'\n \'17.3" Full HD (1920x1080) 60Hz 16:9 Display\'\n \'Ports/Slots:, 2 USB 3.2 Gen1, 1 USB 2.0, 1 HDMI, Thunderbolt 3 (Type-C), SD Card Reader, No Optical Drive, Headphone/Microphone Combo Jack\'\n \'Ports/Slots:\'\n \', 2 USB 3.2 Gen1, 1 USB 2.0, 1 HDMI, Thunderbolt 3 (Type-C), SD Card Reader, No Optical Drive, Headphone/Microphone Combo Jack\'\n \'Battery: 65W Power Supply, 4-Cell 55 WHr Battery\' \'Battery:\'\n \'65W Power Supply, 4-Cell 55 WHr Battery\' \'Color: Natural Silver\'\n \'Color:\' \'Natural Silver\' \'Form/Style: Standard\' \'Form/Style:\' \'Standard\'\n \'Product Dimensions (WxLxH): 15.7 IN x 10.2 IN x 0.76 IN. Weight:  5.8lb\'\n \'Product Dimensions (WxLxH): 15.7 IN x 10.2 IN x 0.76 IN.\' \'Weight:\'\n \'5.8lb\'\n \'1 Year Manufacturer warranty from GreatPriceTech (Professionally upgraded by GreatPriceTech)\'\n \'1 Year Manufacturer warranty from GreatPriceTech (Professionally upgraded by GreatPriceTech)\']',
    'Title: Lenovo ThinkPad E14 14" FHD Business Laptop Computer, Intel Quad-Core i5 10210U Up to 4.2GHz (Beats i7-7500U), 8GB DDR4 RAM, 128GB SSD + 1TB HDD, AC WiFi, BT 5.0, Windows 10 Pro, 64GB USB Flash Drive Descripion: [\'iPuzzle sells computers with upgraded configurations. If the computer has modifications (listed above), then the manufacturer box is opened for it to be tested and inspected and to install the upgrades to achieve the specifications as advertised. If no modifications are listed, the item is unopened and untested. Through our in-depth inspection and testing, and defects can be significantly reduced.\'\n \'Processor\' \'Intel Core i5-10210U (4C / 8T, 1.6 / 4.2GHz, 6MB)\'\n \'Graphics\' \'Intel UHD Graphics\' \'Memory\' \'8GB DDR4-2666\' \'Storage\'\n \'128GB M.2 SSD + 1TB HDD 5400rpm 2.5"\' \'Display\'\n \'14" FHD (1920x1080) IPS 250nits Anti-glare\' \'Ethernet\' \'100/1000M\'\n \'WLAN + Bluetooth\' \'RTL8822CE 11ac, 2x2 + BT5.0\' \'Ports\'\n \'1x USB-C 3.1 Gen 1 (support data transfer, Power Delivery and DisplayPort)\'\n \'1x USB 3.1 Gen 1 (Always On)\' \'1x HDMI 1.4b\'\n \'1x headphone / microphone combo jack (3.5mm)\' \'1x USB 2.0\'\n \'1x Ethernet (RJ-45)\' \'1x USB 3.1 Gen 1\' \'Audio Chip\'\n \'High Definition (HD) Audio, Synaptic CX11880 codec\' \'Speakers\'\n \'Stereo speakers, 2W x2, Dolby Advanced Audio\' \'Camera\'\n \'720p with ThinkShutter\' \'Microphone\' \'2x, Array\' \'Battery\' \'45Wh\'\n \'Power Adapter\' \'65W USB-C\' \'Keyboard\' \'Non-backlit, English\' \'Color\'\n \'Black\' \'Dimensions(WxDxH)\' \'12.8 x 9.13 x 0.74 in\' \'Weight\' \'3.73 lbs\'\n \'Operating System\' \'Windows 10 Pro 64, English\']',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

  • Size: 3,726 training samples
  • Columns: sentence_0 and sentence_1
  • Approximate statistics based on the first 1000 samples:
    sentence_0 sentence_1
    type string string
    details
    • min: 5 tokens
    • mean: 21.78 tokens
    • max: 62 tokens
    • min: 51 tokens
    • mean: 230.13 tokens
    • max: 256 tokens
  • Samples:
    sentence_0 sentence_1
    a laptop for online classes and remote learning. laptop with a large display, integrated graphics, substantial memory, and ample storage capacity. Title: Lenovo 2022 Newest IdeaPad 3 17 17.3" FHD Laptop Computer, Intel Quard-Core i7-1165G7, 20GB DDR4 RAM, 1TB PCIe SSD, WiFi 6, Bluetooth 5.1, Webcam, Arctic Grey, Windows 11, broag 64GB Flash Drive Descripion: ['Processor' 'Intel Core i7-1165G7 (4C / 8T, 2.8 / 4.7GHz, 12MB)'
    'Graphics' 'Integrated Intel Iris Xe Graphics' 'Memory' '20GB DDR4'
    'Storage' '1TB SSD M.2 PCIe NVMe' 'Display'
    '17.3" FHD (1920x1080) IPS 300nits Anti-glare, 72% NTSC' 'Ports & Slots'
    '1x USB 2.0' '1x USB 3.2 Gen 1'
    '1x USB-C 3.2 Gen 1 (support data transfer only)' '1x HDMI 1.4b'
    '1x Card reader' '1x Headphone / microphone combo jack (3.5mm)'
    '1x Power connector' 'WLAN + Bluetooth' 'Wi-Fi 6 11ax, 2x2 + BT5.1'
    'Camera' '720p with Privacy Shutter' 'Microphone' '2x, Array' 'Speakers'
    'Stereo speakers, 1.5W x2, Dolby Audio' 'Color' 'Arctic Grey' 'Keyboard'
    'Non-backlit, English' 'Security Chip' 'Firmware TPM 2.0'
    'Fingerprint Reader' 'Touch Style' 'Other Security'
    'Camera privacy shutter' 'Battery' 'Integrated 45Wh' 'Power Adapter'
    '65W Round Tip Wall-mount' 'Operating System' 'Windows 11 Home, English'
    'Dimensions(WxDxH)' '15.71 x 10.79 x 0.78 inches' 'Weight' '4.63 lbs']
    a laptop for high-speed multitasking and performance enhancement Title: HP 2021 Newest 15.6" Pavilion Laptop with FHD & IPS Display (AMD Ryzen 5 5500U 6-Core, 16GB RAM, 2TB PCIe SSD, AMD Radeon, (1920x1080), FP Reader, WiFi, Webcam, Bluetooth, Win 10 Home) w/Hub Descripion: ["GreatPriceTech sells computers with custom/upgraded configurations to enhance system performance. If the computer has modifications as listed above, the manufacturer’s box was opened by our highly skilled technicians for testing, inspection, and installation of the upgrades according to the specifications advertised. All computers and components are brand new.Processor: AMD Ryzen 5 5500U 2.1GHz Processor (5th Gen, upto 4 GHz, 8MB Cache, 6-Cores) Storage: 2TB PCIe SSD (Solid State Drive) Memory: 16GB DDR4 SO-DIMM Graphics: AMD Radeon Integrated Graphics, Operating System: Windows 10 Home-64 Connectivity: Wi-Fi 6 AX201 Wifi, Bluetooth 4.2, Camera: 720p HD Webcam Input/Output: Fingerprint Security System, Full-size Blue Keyboard, Display: 15.6'' Full HD (1920x1080) 60Hz 16:9 IPS Display Ports/Slots: 2 USB 3.1 Gen1, 1 HDMI, USB 3.1 Type-C Gen1, Micro SD Reader, Headphone/Microphone Combo Jack Battery: 45W Power Supply, 3-Cell 41 WHr Battery Color: Fog Blue Form/Style: Standard Product Dimensions (WxLxH): 9.1 IN x 14 IN x 0.5 IN. Weight: 4.1lb 1 Year Manufacturer warranty from GreatPriceTech (Professionally upgraded by GreatPriceTech)"]
    a laptop for multitasking and casual gaming Title: HP Probook 450 15.6" HD Flagship Business Laptop Computer Intel Quad-Core i5-8265U(Up to 3.9GHz, Beat i7-7500U), 16GB DDR4 RAM, 128GB PCIe SSD, 1TB HDD, Webcam, USB-C, HDMI, Win10 Pro,w/GM Accessories Descripion: ['We sells computers with upgraded configurations. If the computer has modifications (listed above), then the manufacturer box is opened for it to be tested and inspected and to install the upgrades to achieve the specifications as advertised. If no modifications are listed, the item is unopened and untested. Defects & blemishes are significantly reduced by our in depth inspection & testing.'
    'Product Name:' 'HP laptop' 'Operating System:' 'Windows 10 Pro (64-Bit)'
    'Processor:'
    '8th Generation Intel Core i5-8265U Processor @ 1.60GHz (4 Cores, 6M Cache, up to 3.90 GHz)'
    'Memory:' '16GB DDR4' 'Graphics:' 'Intel UHD Graphics 620' 'Display:'
    '15.6" 1366 x 768' 'Storage:' '1TB HDD; 128GB PCIe SSD' 'Optical Drive:'
    'None' 'Ports:'
    '2 x USB 3.1 Gen 1, 1 x USB 3.1 Type-C Gen 1 (Power delivery, DisplayPort), 1 x USB 2.0 (power port), 1 x HDMI, 1 x Headphone/Microphone Combo Jack'
    'Audio:' 'Single digital microphone' 'Input Device:'
    '720p HD webcam, SD, SDHC, SDXC' 'Communications:'
    'Wi-Fi (802.11ac), Bluetooth 4.2' 'Battery:'
    'HP Long Life 3-cell, 45 Wh Li-ion' 'AC Adapter:' '45-watt AC adapter'
    'Dimensions:' '14.37" x 10.11" x 0.75"' 'Weight:' '4.41 lbs']
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 128
  • per_device_eval_batch_size: 128
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 128
  • per_device_eval_batch_size: 128
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 3
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin

Framework Versions

  • Python: 3.11.11
  • Sentence Transformers: 3.1.1
  • Transformers: 4.45.2
  • PyTorch: 2.5.1+cu121
  • Accelerate: 1.2.1
  • Datasets: 3.2.0
  • Tokenizers: 0.20.3

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
3
Safetensors
Model size
109M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for knguyennguyen/mpnet_laptop1k_adjustedv2

Finetuned
(206)
this model