SentenceTransformer based on am-azadi/gte-multilingual-base_Fine_Tuned_1e

This is a sentence-transformers model finetuned from am-azadi/gte-multilingual-base_Fine_Tuned_1e. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: NewModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    'Paul Pelosi’s DUI charges were dropped, by an order from Gavin Newsom. see how this works !?!',
    "DUI charges against Nancy Pelosi's husband dropped",
    'FRAUDE ELECTORAL Se están volviendo a contar las actas de varias mesas en Cantabria',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

  • Size: 25,743 training samples
  • Columns: sentence_0, sentence_1, and label
  • Approximate statistics based on the first 1000 samples:
    sentence_0 sentence_1 label
    type string string float
    details
    • min: 2 tokens
    • mean: 133.84 tokens
    • max: 5210 tokens
    • min: 5 tokens
    • mean: 20.52 tokens
    • max: 140 tokens
    • min: 1.0
    • mean: 1.0
    • max: 1.0
  • Samples:
    sentence_0 sentence_1 label
    Assinando folhas em branco Joe Biden assinou seus primeiros decretos como presidente dos Estados Unidos em folhas em branco 1.0
    FIM DOS TEMPOS NOVA ZELÂNDIA PASSA A PERMITIR ABORTO ATÉ O NASCIMENTO. Parlamento ignora referendo popular e aprova lei. Texto nem exige que seja um médico a realizar o "procedimento". GIL DINIZ DEPUTADO ESTADUAL fto/carteiroreaca sensoCom a aprovação da lei, qualquer mulher poderá tirar a vida de seu bebê em qualquer fase da gravidez. Fim dos tempos! Nova Zelândia passa a permitir aborto até o nascimento 1.0
    बताईये... बाप बार डांसर उठा लाया था, बेटा पोर्न स्टार ही उठा लाया राहुल जी के कांग्रेसी! फिर कहते हैं EVM हैक हो गई... मल्लब हद है एकदम से भारत के विकास Love you Miya Happy Bujix 44 2.5 Indian National Congress workers feeding cake to a poster of a former porn actress Mia Khalifa 1.0
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 1
  • per_device_eval_batch_size: 1
  • num_train_epochs: 1
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 1
  • per_device_eval_batch_size: 1
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin

Training Logs

Epoch Step Training Loss
0.0194 500 0.0
0.0388 1000 0.0
0.0583 1500 0.0
0.0777 2000 0.0
0.0971 2500 0.0
0.1165 3000 0.0
0.1360 3500 0.0
0.1554 4000 0.0
0.1748 4500 0.0
0.1942 5000 0.0
0.2137 5500 0.0
0.2331 6000 0.0
0.2525 6500 0.0
0.2719 7000 0.0
0.2913 7500 0.0
0.3108 8000 0.0
0.3302 8500 0.0
0.3496 9000 0.0
0.3690 9500 0.0
0.3885 10000 0.0
0.4079 10500 0.0
0.4273 11000 0.0
0.4467 11500 0.0
0.4661 12000 0.0
0.4856 12500 0.0
0.5050 13000 0.0
0.5244 13500 0.0
0.5438 14000 0.0
0.5633 14500 0.0
0.5827 15000 0.0
0.6021 15500 0.0
0.6215 16000 0.0
0.6410 16500 0.0
0.6604 17000 0.0
0.6798 17500 0.0
0.6992 18000 0.0
0.7186 18500 0.0
0.7381 19000 0.0
0.7575 19500 0.0
0.7769 20000 0.0
0.7963 20500 0.0
0.8158 21000 0.0
0.8352 21500 0.0
0.8546 22000 0.0
0.8740 22500 0.0
0.8934 23000 0.0
0.9129 23500 0.0
0.9323 24000 0.0
0.9517 24500 0.0
0.9711 25000 0.0
0.9906 25500 0.0

Framework Versions

  • Python: 3.11.11
  • Sentence Transformers: 3.4.1
  • Transformers: 4.48.3
  • PyTorch: 2.5.1+cu124
  • Accelerate: 1.3.0
  • Datasets: 3.3.2
  • Tokenizers: 0.21.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
8
Safetensors
Model size
305M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for am-azadi/gte-multilingual-base_Fine_Tuned_2e