Edit model card

SentenceTransformer based on NeuML/pubmedbert-base-embeddings

This is a sentence-transformers model finetuned from NeuML/pubmedbert-base-embeddings on the mimic10-hard-negatives dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 64, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("alecocc/icd10-hard-negatives")
# Run inference
sentences = [
    'CAD',
    'Atherosclerotic heart disease of native coronary artery with unspecified angina pectoris',
    'Myopia, bilateral',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

mimic10-hard-negatives

  • Dataset: mimic10-hard-negatives at ef88fe5
  • Size: 473,546 training samples
  • Columns: anchor, positive, negative_1, negative_2, negative_3, negative_4, negative_5, negative_6, negative_7, negative_8, negative_9, and negative_10
  • Approximate statistics based on the first 1000 samples:
    anchor positive negative_1 negative_2 negative_3 negative_4 negative_5 negative_6 negative_7 negative_8 negative_9 negative_10
    type string string string string string string string string string string string string
    details
    • min: 3 tokens
    • mean: 4.53 tokens
    • max: 14 tokens
    • min: 3 tokens
    • mean: 9.67 tokens
    • max: 40 tokens
    • min: 3 tokens
    • mean: 10.19 tokens
    • max: 40 tokens
    • min: 3 tokens
    • mean: 10.49 tokens
    • max: 40 tokens
    • min: 3 tokens
    • mean: 10.8 tokens
    • max: 40 tokens
    • min: 3 tokens
    • mean: 11.1 tokens
    • max: 40 tokens
    • min: 3 tokens
    • mean: 11.64 tokens
    • max: 38 tokens
    • min: 3 tokens
    • mean: 15.14 tokens
    • max: 37 tokens
    • min: 3 tokens
    • mean: 15.58 tokens
    • max: 40 tokens
    • min: 4 tokens
    • mean: 15.1 tokens
    • max: 40 tokens
    • min: 3 tokens
    • mean: 14.96 tokens
    • max: 37 tokens
    • min: 3 tokens
    • mean: 15.35 tokens
    • max: 38 tokens
  • Samples:
    anchor positive negative_1 negative_2 negative_3 negative_4 negative_5 negative_6 negative_7 negative_8 negative_9 negative_10
    Anterior exenteration Malignant neoplasm of bladder neck Malignant neoplasm of unspecified kidney, except renal pelvis Malignant neoplasm of unspecified renal pelvis Malignant neoplasm of left ureter Malignant neoplasm of paraurethral glands Malignant neoplasm of left renal pelvis Unspecified kyphosis, cervical region Unspecified superficial injuries of left back wall of thorax, initial encounter Dome fracture of acetabulum Other fracture of left great toe, initial encounter for open fracture Unspecified fracture of upper end of unspecified radius, subsequent encounter for open fracture type IIIA, IIIB, or IIIC with malunion
    Atorvastatin Hyperlipidemia, unspecified Other lactose intolerance Lipomatosis, not elsewhere classified Mucopolysaccharidosis, type II Hyperuricemia without signs of inflammatory arthritis and tophaceous disease Volume depletion, unspecified Glaucoma secondary to other eye disorders, unspecified eye Fracture of one rib, left side, subsequent encounter for fracture with routine healing Toxic effect of other tobacco and nicotine, accidental (unintentional), sequela Puncture wound without foreign body of left ring finger with damage to nail Nondisplaced fracture of epiphysis (separation) (upper) of unspecified femur, subsequent encounter for open fracture type IIIA, IIIB, or IIIC with nonunion
    Urostomy Malignant neoplasm of bladder neck Malignant neoplasm of urinary organ, unspecified Malignant neoplasm of overlapping sites of urinary organs Malignant neoplasm of left ureter Malignant neoplasm of urethra Malignant neoplasm of left renal pelvis Indeterminate leprosy Poisoning by other viral vaccines, accidental (unintentional) Fracture of unspecified metatarsal bone(s), right foot, initial encounter for open fracture Sprain of tarsometatarsal ligament of unspecified foot, subsequent encounter Burn of first degree of multiple sites of left ankle and foot, initial encounter
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 128
  • per_device_eval_batch_size: 128
  • learning_rate: 2e-05
  • num_train_epochs: 2
  • warmup_ratio: 0.1
  • fp16: True
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 128
  • per_device_eval_batch_size: 128
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 2
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss
0.0270 100 4.1948
0.0541 200 3.5402
0.0811 300 3.2462
0.1081 400 2.9691
0.1351 500 2.788
0.1622 600 2.5922
0.1892 700 2.5648
0.2162 800 2.4821
0.2432 900 2.47
0.2703 1000 2.3774
0.2973 1100 2.3415
0.3243 1200 2.2428
0.3514 1300 2.2794
0.3784 1400 2.2372
0.4054 1500 2.2265
0.4324 1600 2.2186
0.4595 1700 2.2074
0.4865 1800 2.159
0.5135 1900 2.1903
0.5405 2000 2.1328
0.5676 2100 2.0685
0.5946 2200 2.1249
0.6216 2300 2.1321
0.6486 2400 2.0725
0.6757 2500 2.0913
0.7027 2600 2.0192
0.7297 2700 2.036
0.7568 2800 1.9863
0.7838 2900 2.0411
0.8108 3000 1.9796
0.8378 3100 2.0102
0.8649 3200 1.8652
0.8919 3300 1.0192
0.9189 3400 0.9623
0.9459 3500 0.957
0.9730 3600 0.8579
1.0 3700 0.7984
1.0270 3800 0.6359
1.0541 3900 0.7348
1.0811 4000 0.6356
1.1081 4100 0.6252
1.1351 4200 0.6587
1.1622 4300 0.602
1.1892 4400 0.6803
1.2162 4500 0.6204
1.2432 4600 0.667
1.2703 4700 0.6253
1.2973 4800 0.5375
1.3243 4900 0.6054
1.3514 5000 0.4541
1.3784 5100 0.5334
1.4054 5200 0.6075
1.4324 5300 0.5037
1.4595 5400 0.4825
1.4865 5500 0.5442
1.5135 5600 0.4999
1.5405 5700 0.6521
1.5676 5800 0.5769
1.5946 5900 0.5029
1.6216 6000 0.5787
1.6486 6100 0.5235
1.6757 6200 0.5839
1.7027 6300 0.5339
1.7297 6400 0.5339
1.7568 6500 0.4515
1.7838 6600 0.5648
1.8108 6700 0.4355
1.8378 6800 0.5321
1.8649 6900 0.4778
1.8919 7000 0.4884
1.9189 7100 0.5941
1.9459 7200 0.5489
1.9730 7300 0.444
2.0 7400 0.4964

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.2.1
  • Transformers: 4.45.2
  • PyTorch: 2.1.2+cu121
  • Accelerate: 0.29.0.dev0
  • Datasets: 2.18.0
  • Tokenizers: 0.20.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
20
Safetensors
Model size
109M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for alecocc/icd10-hard-negatives

Finetuned
(3)
this model

Dataset used to train alecocc/icd10-hard-negatives