metadata
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:21769
- loss:MultipleNegativesRankingLoss
base_model: am-azadi/bilingual-embedding-large_Fine_Tuned_2e
widget:
- source_sentence: >-
Amen.. This Quran was found at the bottom of the sea Has become a rock but
still intact subhanallah, hopefully those who like it, comment amen and
share this post sincerely the sustenance tomorrow morning will be abundant
from the opposite direction unexpected.amen اهيه
sentences:
- >-
Mexico deserves an Oscar for the coffin dance The video of uniformed men
doing "the coffin dance" was recorded in Colombia, not in Mexico
- >-
The Koran was found at the bottom of the sea already turned into a rock
but still intact This is a dictionary covered in crystal and is a work
of art by an American artist
- >-
Video purported to be a video celebrating the inauguration of Hamas' new
office in the Indian state of Kerala False, this claim is a video
celebrating the inauguration of the Hamas office in India
- source_sentence: ' P Stay alert ! 6710 A Japanese man killed his friend just because he didn give him 6x scope in PUBG TAG A PUBG LOVER'
sentences:
- >-
Japanese man killed friend over video game Japanese man killed his
friend in row over video game?
- >-
This photo shows the Glastonbury festival after Greta Thunberg's
participation in 2022 This Glastonbury festival photo is from 2015, not
2022 after Greta Thunberg's speech
- >-
Footage of damaged building was shot in Russia in 2018 Footage shows
Ukraine in 2022, not Russia in 2018
- source_sentence: >-
This is Manoj Tiwari, MP - North East Delhi I am busy bursting
firecrackers, after bursting firecrackers all night, I wake up in the
morning and say, "Today my eyes are burning in Delhi". Manoj Tiwari Today
my eyes are burning in Delhi, and yours? ,
sentences:
- >-
Images show recent unrest and brutality in Uganda None of these images
are related to Uganda’s ongoing political troubles
- >-
The photo shows Indian politician Manoj Tiwari lighting fireworks in
Delhi during smog crisis. This image of an Indian lawmaker lighting a
firecracker has circulated in reports since 2014
- >-
World Economic Forum tweet asks if age of consent should be lowered to
13 Fabricated World Economic Forum tweet about 'lowering age of consent'
misleads online
- source_sentence: ' : He Yunshi was arrested, as expected, but better than expected even faster. . . 6-1 LICEN Fang Bomei BOOT UML'
sentences:
- >-
In Chile they have just expropriated pensions It is not true that in
Chile “the pensions have just been expropriated”
- >-
Four British Airways airline pilots have died from the covid-19 vaccine
British Airways ruled out link between pilot deaths and vaccinations
- >-
Hong Kong Pro-democracy artist Denise Ho arrested in September 2021 Old
photos of Hong Kong pro-democracy activist shared in false 'news' of her
arrest
- source_sentence: >-
Uuuuu mepa that they killed the real bald guy EXCLUSIVE What are you doing
bald, go getting into it jonca that the 12 wants to take pictures with you
at any time 14:04 ✓ they found 2 contact cards for this number re add them
to your contacts? T SEE CONTACT CARDS CELL PHONES TURNED OFF THEY LOOK FOR
THEM EVERYWHERE THE "12" IS LOOKING FOR THEM "serne HD MRASSIA LEGAL: 11
2159 6256 FOR POLICE COMPLAINTS: 11 2159 6256 HERE FOR POLICE COMPLAINTS:
11-
sentences:
- >-
The elected mayor of Medellín does not like ESMAD. WHY WILL IT BE? The
original video shows Daniel Quintero in a demonstration against violence
in Bogotá
- >-
Warning in Paris about stroke in children in post-covid vaccine era The
stroke campaign in France is not about vaccinating children against
covid-19
- >-
They find Diego Molina murdered in his apartment, the skinny from the
funeral home who took photos with Diego Armando Maradona The images of a
lacerated body are not of the person who was photographed with the
corpse of Maradona
pipeline_tag: sentence-similarity
library_name: sentence-transformers
SentenceTransformer based on am-azadi/bilingual-embedding-large_Fine_Tuned_2e
This is a sentence-transformers model finetuned from am-azadi/bilingual-embedding-large_Fine_Tuned_2e. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: am-azadi/bilingual-embedding-large_Fine_Tuned_2e
- Maximum Sequence Length: 512 tokens
- Output Dimensionality: 1024 dimensions
- Similarity Function: Cosine Similarity
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BilingualModel
(1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
'Uuuuu mepa that they killed the real bald guy EXCLUSIVE What are you doing bald, go getting into it jonca that the 12 wants to take pictures with you at any time 14:04 ✓ they found 2 contact cards for this number re add them to your contacts? T SEE CONTACT CARDS CELL PHONES TURNED OFF THEY LOOK FOR THEM EVERYWHERE THE "12" IS LOOKING FOR THEM "serne HD MRASSIA LEGAL: 11 2159 6256 FOR POLICE COMPLAINTS: 11 2159 6256 HERE FOR POLICE COMPLAINTS: 11-',
'They find Diego Molina murdered in his apartment, the skinny from the funeral home who took photos with Diego Armando Maradona The images of a lacerated body are not of the person who was photographed with the corpse of Maradona',
'The elected mayor of Medellín does not like ESMAD. WHY WILL IT BE? The original video shows Daniel Quintero in a demonstration against violence in Bogotá',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
Training Details
Training Dataset
Unnamed Dataset
- Size: 21,769 training samples
- Columns:
sentence_0
andsentence_1
- Approximate statistics based on the first 1000 samples:
sentence_0 sentence_1 type string string details - min: 4 tokens
- mean: 122.97 tokens
- max: 512 tokens
- min: 17 tokens
- mean: 38.24 tokens
- max: 109 tokens
- Samples:
sentence_0 sentence_1 NEW HANDLING OF ALERT While the achieves 6,101,968 votes (i.e. 26.8%), the Ministry of the Interior only gives it 5,836,202 votes (i.e. 25.7%) to artificially make 's party appear in the lead . Hello Council of State?
The Ministry of the Interior manipulated the results of the legislative elections Legislative: why are the results of the 1st round contested by the Nupes?
<3<3... Civil Registry Offices in Brazil: The only source that does not lie, as it issues all death certificates daily, for all reasons. This source cannot be disputed by anyone. Only they can say for sure, how many people die each day, and the reason for death. The rest is fake news. Via Jose Mendes Junior Updating... Deaths in Brazil: July 2019 - 119,390 (without pandemic) July 2020 - 113,475 (with pandemic) Source: transparencia.registrocivil.org.br... Now what are they going to say????
More deaths were recorded in Brazil in July 2019, before the pandemic, than in July 2020, during the new coronavirus pandemic. Publications use partial data on deaths recorded in July 2020
Zimbabwe Police are taking disciplinary action with a church that refused to take closure instructions to prevent the spread of Coronavirus.
Worshipers beaten in Zimbabwe for failing to comply with coronavirus assembly ban No, worshipers have not been beaten by police in Zimbabwe for gathering during the coronavirus outbreak
- Loss:
MultipleNegativesRankingLoss
with these parameters:{ "scale": 20.0, "similarity_fct": "cos_sim" }
Training Hyperparameters
Non-Default Hyperparameters
per_device_train_batch_size
: 2per_device_eval_batch_size
: 2num_train_epochs
: 1multi_dataset_batch_sampler
: round_robin
All Hyperparameters
Click to expand
overwrite_output_dir
: Falsedo_predict
: Falseeval_strategy
: noprediction_loss_only
: Trueper_device_train_batch_size
: 2per_device_eval_batch_size
: 2per_gpu_train_batch_size
: Noneper_gpu_eval_batch_size
: Nonegradient_accumulation_steps
: 1eval_accumulation_steps
: Nonetorch_empty_cache_steps
: Nonelearning_rate
: 5e-05weight_decay
: 0.0adam_beta1
: 0.9adam_beta2
: 0.999adam_epsilon
: 1e-08max_grad_norm
: 1num_train_epochs
: 1max_steps
: -1lr_scheduler_type
: linearlr_scheduler_kwargs
: {}warmup_ratio
: 0.0warmup_steps
: 0log_level
: passivelog_level_replica
: warninglog_on_each_node
: Truelogging_nan_inf_filter
: Truesave_safetensors
: Truesave_on_each_node
: Falsesave_only_model
: Falserestore_callback_states_from_checkpoint
: Falseno_cuda
: Falseuse_cpu
: Falseuse_mps_device
: Falseseed
: 42data_seed
: Nonejit_mode_eval
: Falseuse_ipex
: Falsebf16
: Falsefp16
: Falsefp16_opt_level
: O1half_precision_backend
: autobf16_full_eval
: Falsefp16_full_eval
: Falsetf32
: Nonelocal_rank
: 0ddp_backend
: Nonetpu_num_cores
: Nonetpu_metrics_debug
: Falsedebug
: []dataloader_drop_last
: Falsedataloader_num_workers
: 0dataloader_prefetch_factor
: Nonepast_index
: -1disable_tqdm
: Falseremove_unused_columns
: Truelabel_names
: Noneload_best_model_at_end
: Falseignore_data_skip
: Falsefsdp
: []fsdp_min_num_params
: 0fsdp_config
: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap
: Noneaccelerator_config
: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed
: Nonelabel_smoothing_factor
: 0.0optim
: adamw_torchoptim_args
: Noneadafactor
: Falsegroup_by_length
: Falselength_column_name
: lengthddp_find_unused_parameters
: Noneddp_bucket_cap_mb
: Noneddp_broadcast_buffers
: Falsedataloader_pin_memory
: Truedataloader_persistent_workers
: Falseskip_memory_metrics
: Trueuse_legacy_prediction_loop
: Falsepush_to_hub
: Falseresume_from_checkpoint
: Nonehub_model_id
: Nonehub_strategy
: every_savehub_private_repo
: Nonehub_always_push
: Falsegradient_checkpointing
: Falsegradient_checkpointing_kwargs
: Noneinclude_inputs_for_metrics
: Falseinclude_for_metrics
: []eval_do_concat_batches
: Truefp16_backend
: autopush_to_hub_model_id
: Nonepush_to_hub_organization
: Nonemp_parameters
:auto_find_batch_size
: Falsefull_determinism
: Falsetorchdynamo
: Noneray_scope
: lastddp_timeout
: 1800torch_compile
: Falsetorch_compile_backend
: Nonetorch_compile_mode
: Nonedispatch_batches
: Nonesplit_batches
: Noneinclude_tokens_per_second
: Falseinclude_num_input_tokens_seen
: Falseneftune_noise_alpha
: Noneoptim_target_modules
: Nonebatch_eval_metrics
: Falseeval_on_start
: Falseuse_liger_kernel
: Falseeval_use_gather_object
: Falseaverage_tokens_across_devices
: Falseprompts
: Nonebatch_sampler
: batch_samplermulti_dataset_batch_sampler
: round_robin
Training Logs
Epoch | Step | Training Loss |
---|---|---|
0.0459 | 500 | 0.0148 |
0.0919 | 1000 | 0.0066 |
0.1378 | 1500 | 0.0245 |
0.1837 | 2000 | 0.0184 |
0.2297 | 2500 | 0.0174 |
0.2756 | 3000 | 0.0053 |
0.3215 | 3500 | 0.025 |
0.3675 | 4000 | 0.0105 |
0.4134 | 4500 | 0.0054 |
0.4593 | 5000 | 0.0076 |
0.5053 | 5500 | 0.0085 |
0.5512 | 6000 | 0.0104 |
0.5972 | 6500 | 0.0208 |
0.6431 | 7000 | 0.0072 |
0.6890 | 7500 | 0.0084 |
0.7350 | 8000 | 0.0053 |
0.7809 | 8500 | 0.0052 |
0.8268 | 9000 | 0.0064 |
0.8728 | 9500 | 0.0074 |
0.9187 | 10000 | 0.0083 |
0.9646 | 10500 | 0.008 |
Framework Versions
- Python: 3.11.11
- Sentence Transformers: 3.4.1
- Transformers: 4.48.3
- PyTorch: 2.5.1+cu124
- Accelerate: 1.3.0
- Datasets: 3.3.2
- Tokenizers: 0.21.0
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
MultipleNegativesRankingLoss
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}