whitemouse84's picture
Update README.md
d0a690d verified
metadata
base_model: cointegrated/LaBSE-en-ru
language:
  - ru
  - en
library_name: sentence-transformers
metrics:
  - pearson_cosine
  - spearman_cosine
  - pearson_manhattan
  - spearman_manhattan
  - pearson_euclidean
  - spearman_euclidean
  - pearson_dot
  - spearman_dot
  - pearson_max
  - spearman_max
  - negative_mse
pipeline_tag: sentence-similarity
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:10975066
  - loss:MSELoss
widget:
  - source_sentence: Такие лодки строились, чтобы получить быстрый доступ к приходящим судам.
    sentences:
      - been nice talking to you
      - >-
        Нельзя ставить под сомнение притязания клиента, если не были предприняты
        шаги.
      - >-
        Dharangaon Railway Station serves Dharangaon in Jalgaon district in the
        Indian state of Maharashtra.
  - source_sentence: >-
      Если прилагательные смягчают этнические термины, существительные могут
      сделать их жестче.
    sentences:
      - >-
        Вслед за этим последовало секретное письмо А.Б.Чубайса об изъятии у МЦР,
        переданного ему С.Н.Рерихом наследия.
      - Coaches should not give young athletes a hard time.
      - Эшкрофт хотел прослушивать сводки новостей снова и снова
  - source_sentence: Земля была мягкой.
    sentences:
      - >-
        По мере того, как самообладание покидало его, сердце его все больше
        наполнялось тревогой.
      - >-
        Our borders and immigration system, including law enforcement, ought to
        send a message of welcome, tolerance, and justice to members of
        immigrant communities in the United States and in their countries of
        origin.
      - >-
        Начнут действовать льготные условия аренды земель, которые предназначены
        для реализации инвестиционных проектов.
  - source_sentence: >-
      Что же касается рава Кука: мой рав лично знал его и много раз с теплотой
      рассказывал мне о нем как о великом каббалисте.
    sentences:
      - Вдова Эдгара Эванса, его дети и мать получили 1500 фунтов стерлингов (
      - Please do not make any changes to your address.
      - Мы уже закончили все запланированные дела!
  - source_sentence: See Name section.
    sentences:
      - >-
        Ms. Packard is the voice of the female blood elf in the video game World
        of Warcraft.
      - >-
        Основным функциональным элементом, реализующим функции управления
        соединением, является абонентский терминал.
      - Yeah, people who might not be hungry.
model-index:
  - name: SentenceTransformer based on cointegrated/LaBSE-en-ru
    results:
      - task:
          type: semantic-similarity
          name: Semantic Similarity
        dataset:
          name: sts dev
          type: sts-dev
        metrics:
          - type: pearson_cosine
            value: 0.5305176535187099
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.6347069834349862
            name: Spearman Cosine
          - type: pearson_manhattan
            value: 0.5553415140113596
            name: Pearson Manhattan
          - type: spearman_manhattan
            value: 0.6389336208598283
            name: Spearman Manhattan
          - type: pearson_euclidean
            value: 0.5499910306125031
            name: Pearson Euclidean
          - type: spearman_euclidean
            value: 0.6347073809507647
            name: Spearman Euclidean
          - type: pearson_dot
            value: 0.5305176585564861
            name: Pearson Dot
          - type: spearman_dot
            value: 0.6347078463557637
            name: Spearman Dot
          - type: pearson_max
            value: 0.5553415140113596
            name: Pearson Max
          - type: spearman_max
            value: 0.6389336208598283
            name: Spearman Max
      - task:
          type: knowledge-distillation
          name: Knowledge Distillation
        dataset:
          name: Unknown
          type: unknown
        metrics:
          - type: negative_mse
            value: -0.006337030936265364
            name: Negative Mse
      - task:
          type: semantic-similarity
          name: Semantic Similarity
        dataset:
          name: sts test
          type: sts-test
        metrics:
          - type: pearson_cosine
            value: 0.5042796836494269
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.5986471772428711
            name: Spearman Cosine
          - type: pearson_manhattan
            value: 0.522744495080616
            name: Pearson Manhattan
          - type: spearman_manhattan
            value: 0.5983901280447074
            name: Spearman Manhattan
          - type: pearson_euclidean
            value: 0.522721961447153
            name: Pearson Euclidean
          - type: spearman_euclidean
            value: 0.5986471095414022
            name: Spearman Euclidean
          - type: pearson_dot
            value: 0.504279685613151
            name: Pearson Dot
          - type: spearman_dot
            value: 0.598648155615724
            name: Spearman Dot
          - type: pearson_max
            value: 0.522744495080616
            name: Pearson Max
          - type: spearman_max
            value: 0.598648155615724
            name: Spearman Max

SentenceTransformer based on cointegrated/LaBSE-en-ru

This is a sentence-transformers model finetuned from cointegrated/LaBSE-en-ru. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: cointegrated/LaBSE-en-ru
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 tokens
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Dense({'in_features': 768, 'out_features': 768, 'bias': True, 'activation_function': 'torch.nn.modules.activation.Tanh'})
  (3): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("whitemouse84/LaBSE-en-ru-distilled-each-third-layer")
# Run inference
sentences = [
    'See Name section.',
    'Ms. Packard is the voice of the female blood elf in the video game World of Warcraft.',
    'Yeah, people who might not be hungry.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Semantic Similarity

Metric Value
pearson_cosine 0.5305
spearman_cosine 0.6347
pearson_manhattan 0.5553
spearman_manhattan 0.6389
pearson_euclidean 0.55
spearman_euclidean 0.6347
pearson_dot 0.5305
spearman_dot 0.6347
pearson_max 0.5553
spearman_max 0.6389

Knowledge Distillation

Metric Value
negative_mse -0.0063

Semantic Similarity

Metric Value
pearson_cosine 0.5043
spearman_cosine 0.5986
pearson_manhattan 0.5227
spearman_manhattan 0.5984
pearson_euclidean 0.5227
spearman_euclidean 0.5986
pearson_dot 0.5043
spearman_dot 0.5986
pearson_max 0.5227
spearman_max 0.5986

Training Details

Training Dataset

Unnamed Dataset

  • Size: 10,975,066 training samples
  • Columns: sentence and label
  • Approximate statistics based on the first 1000 samples:
    sentence label
    type string list
    details
    • min: 6 tokens
    • mean: 26.93 tokens
    • max: 139 tokens
    • size: 768 elements
  • Samples:
    sentence label
    It is based on the Java Persistence API (JPA), but it does not strictly follow the JSR 338 Specification, as it implements different design patterns and technologies. [-0.012331949546933174, -0.04570527374744415, -0.024963658303022385, -0.03620213270187378, 0.022556383162736893, ...]
    Покупаем вторичное сырье в Каунасе (Переработка вторичного сырья) - Алфенас АНД КО, ЗАО на Bizorg. [-0.07498518377542496, -0.01913534104824066, -0.01797042042016983, 0.048263177275657654, -0.00016611881437711418, ...]
    At the Equal Justice Conference ( EJC ) held in March 2001 in San Diego , LSC and the Project for the Future of Equal Justice held the second Case Management Software pre-conference . [0.03870972990989685, -0.0638347640633583, -0.01696585863828659, -0.043612319976091385, -0.048241738229990005, ...]
  • Loss: MSELoss

Evaluation Dataset

Unnamed Dataset

  • Size: 10,000 evaluation samples
  • Columns: sentence and label
  • Approximate statistics based on the first 1000 samples:
    sentence label
    type string list
    details
    • min: 5 tokens
    • mean: 24.18 tokens
    • max: 111 tokens
    • size: 768 elements
  • Samples:
    sentence label
    The Canadian Canoe Museum is a museum dedicated to canoes located in Peterborough, Ontario, Canada. [-0.05444105342030525, -0.03650881350040436, -0.041163671761751175, -0.010616903193295002, -0.04094529151916504, ...]
    И мне нравилось, что я одновременно зарабатываю и смотрю бои». [-0.03404555842280388, 0.028203096240758896, -0.056121889501810074, -0.0591997392475605, -0.05523117259144783, ...]
    Ну, а на следующий день, разумеется, Президент Кеннеди объявил блокаду Кубы, и наши корабли остановили у кубинских берегов направлявшийся на Кубу российский корабль, и у него на борту нашли ракеты. [-0.008193841204047203, 0.00694894278421998, -0.03027420863509178, -0.03290146216750145, 0.01425305474549532, ...]
  • Loss: MSELoss

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 64
  • per_device_eval_batch_size: 64
  • learning_rate: 0.0001
  • num_train_epochs: 1
  • warmup_ratio: 0.1
  • fp16: True
  • load_best_model_at_end: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 64
  • per_device_eval_batch_size: 64
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 0.0001
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • eval_use_gather_object: False
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Click to expand
Epoch Step Training Loss loss negative_mse sts-dev_spearman_cosine sts-test_spearman_cosine
0 0 - - -0.2381 0.4206 -
0.0058 1000 0.0014 - - - -
0.0117 2000 0.0009 - - - -
0.0175 3000 0.0007 - - - -
0.0233 4000 0.0006 - - - -
0.0292 5000 0.0005 0.0004 -0.0363 0.6393 -
0.0350 6000 0.0004 - - - -
0.0408 7000 0.0004 - - - -
0.0467 8000 0.0003 - - - -
0.0525 9000 0.0003 - - - -
0.0583 10000 0.0003 0.0002 -0.0207 0.6350 -
0.0641 11000 0.0003 - - - -
0.0700 12000 0.0003 - - - -
0.0758 13000 0.0002 - - - -
0.0816 14000 0.0002 - - - -
0.0875 15000 0.0002 0.0002 -0.0157 0.6328 -
0.0933 16000 0.0002 - - - -
0.0991 17000 0.0002 - - - -
0.1050 18000 0.0002 - - - -
0.1108 19000 0.0002 - - - -
0.1166 20000 0.0002 0.0001 -0.0132 0.6317 -
0.1225 21000 0.0002 - - - -
0.1283 22000 0.0002 - - - -
0.1341 23000 0.0002 - - - -
0.1400 24000 0.0002 - - - -
0.1458 25000 0.0002 0.0001 -0.0118 0.6251 -
0.1516 26000 0.0002 - - - -
0.1574 27000 0.0002 - - - -
0.1633 28000 0.0002 - - - -
0.1691 29000 0.0002 - - - -
0.1749 30000 0.0002 0.0001 -0.0109 0.6304 -
0.1808 31000 0.0002 - - - -
0.1866 32000 0.0002 - - - -
0.1924 33000 0.0002 - - - -
0.1983 34000 0.0001 - - - -
0.2041 35000 0.0001 0.0001 -0.0102 0.6280 -
0.2099 36000 0.0001 - - - -
0.2158 37000 0.0001 - - - -
0.2216 38000 0.0001 - - - -
0.2274 39000 0.0001 - - - -
0.2333 40000 0.0001 0.0001 -0.0098 0.6272 -
0.2391 41000 0.0001 - - - -
0.2449 42000 0.0001 - - - -
0.2507 43000 0.0001 - - - -
0.2566 44000 0.0001 - - - -
0.2624 45000 0.0001 0.0001 -0.0093 0.6378 -
0.2682 46000 0.0001 - - - -
0.2741 47000 0.0001 - - - -
0.2799 48000 0.0001 - - - -
0.2857 49000 0.0001 - - - -
0.2916 50000 0.0001 0.0001 -0.0089 0.6325 -
0.2974 51000 0.0001 - - - -
0.3032 52000 0.0001 - - - -
0.3091 53000 0.0001 - - - -
0.3149 54000 0.0001 - - - -
0.3207 55000 0.0001 0.0001 -0.0087 0.6328 -
0.3266 56000 0.0001 - - - -
0.3324 57000 0.0001 - - - -
0.3382 58000 0.0001 - - - -
0.3441 59000 0.0001 - - - -
0.3499 60000 0.0001 0.0001 -0.0085 0.6357 -
0.3557 61000 0.0001 - - - -
0.3615 62000 0.0001 - - - -
0.3674 63000 0.0001 - - - -
0.3732 64000 0.0001 - - - -
0.3790 65000 0.0001 0.0001 -0.0083 0.6366 -
0.3849 66000 0.0001 - - - -
0.3907 67000 0.0001 - - - -
0.3965 68000 0.0001 - - - -
0.4024 69000 0.0001 - - - -
0.4082 70000 0.0001 0.0001 -0.0080 0.6325 -
0.4140 71000 0.0001 - - - -
0.4199 72000 0.0001 - - - -
0.4257 73000 0.0001 - - - -
0.4315 74000 0.0001 - - - -
0.4374 75000 0.0001 0.0001 -0.0078 0.6351 -
0.4432 76000 0.0001 - - - -
0.4490 77000 0.0001 - - - -
0.4548 78000 0.0001 - - - -
0.4607 79000 0.0001 - - - -
0.4665 80000 0.0001 0.0001 -0.0077 0.6323 -
0.4723 81000 0.0001 - - - -
0.4782 82000 0.0001 - - - -
0.4840 83000 0.0001 - - - -
0.4898 84000 0.0001 - - - -
0.4957 85000 0.0001 0.0001 -0.0076 0.6316 -
0.5015 86000 0.0001 - - - -
0.5073 87000 0.0001 - - - -
0.5132 88000 0.0001 - - - -
0.5190 89000 0.0001 - - - -
0.5248 90000 0.0001 0.0001 -0.0074 0.6306 -
0.5307 91000 0.0001 - - - -
0.5365 92000 0.0001 - - - -
0.5423 93000 0.0001 - - - -
0.5481 94000 0.0001 - - - -
0.5540 95000 0.0001 0.0001 -0.0073 0.6305 -
0.5598 96000 0.0001 - - - -
0.5656 97000 0.0001 - - - -
0.5715 98000 0.0001 - - - -
0.5773 99000 0.0001 - - - -
0.5831 100000 0.0001 0.0001 -0.0072 0.6333 -
0.5890 101000 0.0001 - - - -
0.5948 102000 0.0001 - - - -
0.6006 103000 0.0001 - - - -
0.6065 104000 0.0001 - - - -
0.6123 105000 0.0001 0.0001 -0.0071 0.6351 -
0.6181 106000 0.0001 - - - -
0.6240 107000 0.0001 - - - -
0.6298 108000 0.0001 - - - -
0.6356 109000 0.0001 - - - -
0.6415 110000 0.0001 0.0001 -0.0070 0.6330 -
0.6473 111000 0.0001 - - - -
0.6531 112000 0.0001 - - - -
0.6589 113000 0.0001 - - - -
0.6648 114000 0.0001 - - - -
0.6706 115000 0.0001 0.0001 -0.0070 0.6336 -
0.6764 116000 0.0001 - - - -
0.6823 117000 0.0001 - - - -
0.6881 118000 0.0001 - - - -
0.6939 119000 0.0001 - - - -
0.6998 120000 0.0001 0.0001 -0.0069 0.6305 -
0.7056 121000 0.0001 - - - -
0.7114 122000 0.0001 - - - -
0.7173 123000 0.0001 - - - -
0.7231 124000 0.0001 - - - -
0.7289 125000 0.0001 0.0001 -0.0068 0.6362 -
0.7348 126000 0.0001 - - - -
0.7406 127000 0.0001 - - - -
0.7464 128000 0.0001 - - - -
0.7522 129000 0.0001 - - - -
0.7581 130000 0.0001 0.0001 -0.0067 0.6340 -
0.7639 131000 0.0001 - - - -
0.7697 132000 0.0001 - - - -
0.7756 133000 0.0001 - - - -
0.7814 134000 0.0001 - - - -
0.7872 135000 0.0001 0.0001 -0.0067 0.6365 -
0.7931 136000 0.0001 - - - -
0.7989 137000 0.0001 - - - -
0.8047 138000 0.0001 - - - -
0.8106 139000 0.0001 - - - -
0.8164 140000 0.0001 0.0001 -0.0066 0.6339 -
0.8222 141000 0.0001 - - - -
0.8281 142000 0.0001 - - - -
0.8339 143000 0.0001 - - - -
0.8397 144000 0.0001 - - - -
0.8456 145000 0.0001 0.0001 -0.0066 0.6352 -
0.8514 146000 0.0001 - - - -
0.8572 147000 0.0001 - - - -
0.8630 148000 0.0001 - - - -
0.8689 149000 0.0001 - - - -
0.8747 150000 0.0001 0.0001 -0.0065 0.6357 -
0.8805 151000 0.0001 - - - -
0.8864 152000 0.0001 - - - -
0.8922 153000 0.0001 - - - -
0.8980 154000 0.0001 - - - -
0.9039 155000 0.0001 0.0001 -0.0065 0.6336 -
0.9097 156000 0.0001 - - - -
0.9155 157000 0.0001 - - - -
0.9214 158000 0.0001 - - - -
0.9272 159000 0.0001 - - - -
0.9330 160000 0.0001 0.0001 -0.0064 0.6334 -
0.9389 161000 0.0001 - - - -
0.9447 162000 0.0001 - - - -
0.9505 163000 0.0001 - - - -
0.9563 164000 0.0001 - - - -
0.9622 165000 0.0001 0.0001 -0.0064 0.6337 -
0.9680 166000 0.0001 - - - -
0.9738 167000 0.0001 - - - -
0.9797 168000 0.0001 - - - -
0.9855 169000 0.0001 - - - -
0.9913 170000 0.0001 0.0001 -0.0063 0.6347 -
0.9972 171000 0.0001 - - - -
1.0 171486 - - - - 0.5986
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.10.14
  • Sentence Transformers: 3.0.1
  • Transformers: 4.44.0
  • PyTorch: 2.4.0
  • Accelerate: 0.33.0
  • Datasets: 2.20.0
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MSELoss

@inproceedings{reimers-2020-multilingual-sentence-bert,
    title = "Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2020",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/2004.09813",
}