SentenceTransformer based on google-bert/bert-base-uncased

This is a sentence-transformers model finetuned from google-bert/bert-base-uncased. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: google-bert/bert-base-uncased
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 tokens
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("jameswright/ws-wr-questions-bert-TSDAE-v1")
# Run inference
sentences = [
    'You have been sick so If questions it just - ’ being sorted, thanks Then assertiveness course and a . overtime isn selfish it just doing ’ s right for Why you?',
    'You have been off sick so it makes sense. If anyone questions it just grey rock - it’s being sorted, thanks. Then do an online assertiveness course and ask your GP for a CBT referral. Not doing overtime isn’t selfish - it’s just you doing what’s right for you. Why would you do anything else?',
    'Science works by the accumulation of evidence.\xa0 Independent groups work on projects and publish results.\xa0 Those results are examined and tested and examined again and tested again, such that they\'re either confirmed or discarded and further work continues accordingly.\xa0 If a scientist or doctor disagrees with the \'official line\' they\'re asked to present the data, methods and conclusions that have led to that disagreement so that it can by examined by the broader scientific and medical community.\xa0 \xa0 And yes, someone who goes on YouTube or wherever - whether doctor, scientist or layperson - and tells viewers that a vaccine alters DNA structure and destroys the immune system is either a grifter or a fruitcake. 1 minute ago, FIRETHORN1 said: ...Can you not accept that some people can hold an opposite view quite genuinely? To me, a "conspiracy theorist" is someone who believes what they are told, without any evidence to back it up. I can absolutely accept that someone can genuinely believe something without having any evidence at all to support that view.\xa0 I\'m believing that right now, in fact. 1 minute ago, FIRETHORN1 said: There is no evidence whatsoever that the vaccine works That is categorically, absolutely and undeniably false, as the most cursory of research will tell you.\xa0 But then you don\'t actually want\xa0 to believe me, do you?',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

  • Size: 132,712 training samples
  • Columns: sentence_0 and sentence_1
  • Approximate statistics based on the first 1000 samples:
    sentence_0 sentence_1
    type string string
    details
    • min: 4 tokens
    • mean: 47.75 tokens
    • max: 460 tokens
    • min: 17 tokens
    • mean: 114.94 tokens
    • max: 512 tokens
  • Samples:
    sentence_0 sentence_1
    ’ Can really go to the doctors I ’ bored of the ” . Feels more like than a doctor, does sound depression, so seeing GP a first PetersRabbitt I don’t know? Can I really go to the doctors and say “hey, yes my problem is I’m bored all of the time”. Feels more like a me problem than one a doctor can help with. Yes, absolutely. It does sound like it could be depression, so seeing your GP is a good first step.
    Ursuladevine Between 11 16, if hasn t, what has been been providing education have LakieLady Yesterday 15:34 My that dismissed offhand son the school have up for assessment . Within years referred diagnosed with PTSD,,, social anxiety and, decided his be. Ursuladevine · Yesterday 15:42 Between 11 and 16, if he hasn’t been attending school, what has he been doing? Has the LA been providing any education or have you been HE? LakieLady · Yesterday 15:34 My friend tried that, and the GP dismissed it offhand, saying that if her son was neurodivergent, the school would have picked up on it and referred him for assessment. Her DS was eight at the time. Within the next 2-3 years, he got much worse, was referred to CAMHS, diagnosed with significant MH problems (PTSD, GAD, depression, social anxiety disorder) and after a couple of years, CAMHS decided his mother might not be talking bollocks and that he might have ASD.
    It sounds you were a child then came along realised here was he - and this to it . young, I'd imagine It sounds like you were hurt by one man when you were a child, then another came along and realised here was someone damaged he could dominate - and added his own abuse. They can sniff this out and are attracted to it. How old were you when he arrived? Very young, I'd imagine. Stepfather?
  • Loss: DenoisingAutoEncoderLoss

Training Hyperparameters

Non-Default Hyperparameters

  • num_train_epochs: 5
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 8
  • per_device_eval_batch_size: 8
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 5
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin

Training Logs

Click to expand
Epoch Step Training Loss
0.0301 500 4.7687
0.0603 1000 4.2523
0.0904 1500 4.1156
0.1206 2000 4.0278
0.1507 2500 3.9652
0.1808 3000 3.919
0.2110 3500 3.8629
0.2411 4000 3.7985
0.2713 4500 3.7625
0.3014 5000 3.7523
0.3315 5500 3.7316
0.3617 6000 3.6837
0.3918 6500 3.669
0.4220 7000 3.6394
0.4521 7500 3.6017
0.4822 8000 3.5693
0.5124 8500 3.5821
0.5425 9000 3.5488
0.5727 9500 3.5139
0.6028 10000 3.5119
0.6329 10500 3.4988
0.6631 11000 3.4741
0.6932 11500 3.4719
0.7234 12000 3.4501
0.7535 12500 3.4353
0.7837 13000 3.4107
0.8138 13500 3.4023
0.8439 14000 3.3902
0.8741 14500 3.3697
0.9042 15000 3.3731
0.9344 15500 3.3603
0.9645 16000 3.3284
0.9946 16500 3.3339
1.0248 17000 3.2793
1.0549 17500 3.2098
1.0851 18000 3.1994
1.1152 18500 3.1801
1.1453 19000 3.1634
1.1755 19500 3.1566
1.2056 20000 3.1205
1.2358 20500 3.1064
1.2659 21000 3.1028
1.2960 21500 3.099
1.3262 22000 3.1028
1.3563 22500 3.0653
1.3865 23000 3.044
1.4166 23500 3.0481
1.4467 24000 3.0133
1.4769 24500 2.9667
1.5070 25000 3.0226
1.5372 25500 2.991
1.5673 26000 2.9593
1.5974 26500 2.9598
1.6276 27000 2.9572
1.6577 27500 2.9579
1.6879 28000 2.9303
1.7180 28500 2.948
1.7481 29000 2.918
1.7783 29500 2.9014
1.8084 30000 2.8948
1.8386 30500 2.8916
1.8687 31000 2.8787
1.8988 31500 2.8864
1.9290 32000 2.8649
1.9591 32500 2.8419
1.9893 33000 2.8688
2.0194 33500 2.8329
2.0496 34000 2.7442
2.0797 34500 2.7501
2.1098 35000 2.7466
2.1400 35500 2.7343
2.1701 36000 2.7014
2.2003 36500 2.6891
2.2304 37000 2.6819
2.2605 37500 2.6779
2.2907 38000 2.6872
2.3208 38500 2.6758
2.3510 39000 2.6665
2.3811 39500 2.6392
2.4112 40000 2.6362
2.4414 40500 2.6038
2.4715 41000 2.5535
2.5017 41500 2.6081
2.5318 42000 2.6071
2.5619 42500 2.5571
2.5921 43000 2.5774
2.6222 43500 2.5556
2.6524 44000 2.5683
2.6825 44500 2.5317
2.7126 45000 2.5509
2.7428 45500 2.5292
2.7729 46000 2.52
2.8031 46500 2.4818
2.8332 47000 2.5258
2.8633 47500 2.482
2.8935 48000 2.5038
2.9236 48500 2.4864
2.9538 49000 2.4591
2.9839 49500 2.4887
3.0140 50000 2.4635
3.0442 50500 2.3837
3.0743 51000 2.3886
3.1045 51500 2.3836
3.1346 52000 2.38
3.1647 52500 2.3456
3.1949 53000 2.3171
3.2250 53500 2.3341
3.2552 54000 2.3228
3.2853 54500 2.3459
3.3154 55000 2.3251
3.3456 55500 2.3365
3.3757 56000 2.2838
3.4059 56500 2.3042
3.4360 57000 2.2465
3.4662 57500 2.2304
3.4963 58000 2.251
3.5264 58500 2.2727
3.5566 59000 2.2324
3.5867 59500 2.2325
3.6169 60000 2.2246
3.6470 60500 2.2287
3.6771 61000 2.2067
3.7073 61500 2.2206
3.7374 62000 2.1882
3.7676 62500 2.1889
3.7977 63000 2.1559
3.8278 63500 2.2021
3.8580 64000 2.1643
3.8881 64500 2.145
3.9183 65000 2.1707
3.9484 65500 2.1349
3.9785 66000 2.1659
4.0087 66500 2.152
4.0388 67000 2.0801
4.0690 67500 2.0729
4.0991 68000 2.0676
4.1292 68500 2.0622
4.1594 69000 2.0376
4.1895 69500 2.027
4.2197 70000 2.0227
4.2498 70500 2.0146
4.2799 71000 2.0334
4.3101 71500 2.0428
4.3402 72000 2.034
4.3704 72500 1.9907
4.4005 73000 2.0106
4.4306 73500 1.9488
4.4608 74000 1.961
4.4909 74500 1.9351
4.5211 75000 1.9875
4.5512 75500 1.9454
4.5813 76000 1.9453
4.6115 76500 1.9239
4.6416 77000 1.9664
4.6718 77500 1.906
4.7019 78000 1.9256
4.7321 78500 1.9071
4.7622 79000 1.9117
4.7923 79500 1.8817
4.8225 80000 1.9101
4.8526 80500 1.8872
4.8828 81000 1.8634
4.9129 81500 1.8791
4.9430 82000 1.8801
4.9732 82500 1.8586

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.0.1
  • Transformers: 4.42.3
  • PyTorch: 2.3.0+cu121
  • Accelerate: 0.31.0
  • Datasets: 2.20.0
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

DenoisingAutoEncoderLoss

@inproceedings{wang-2021-TSDAE,
    title = "TSDAE: Using Transformer-based Sequential Denoising Auto-Encoderfor Unsupervised Sentence Embedding Learning",
    author = "Wang, Kexin and Reimers, Nils and Gurevych, Iryna", 
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2021",
    month = nov,
    year = "2021",
    address = "Punta Cana, Dominican Republic",
    publisher = "Association for Computational Linguistics",
    pages = "671--688",
    url = "https://arxiv.org/abs/2104.06979",
}
Downloads last month
14
Safetensors
Model size
109M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for jameswright/ws-wr-questions-bert-TSDAE-v1

Finetuned
(2411)
this model