SentenceTransformer based on FacebookAI/xlm-roberta-base

This is a sentence-transformers model finetuned from FacebookAI/xlm-roberta-base on the en-es dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Type: Sentence Transformer
Base model: FacebookAI/xlm-roberta-base
Maximum Sequence Length: 128 tokens
Output Dimensionality: 768 dimensions
Similarity Function: Cosine Similarity
Training Dataset:
- en-es
Languages: en, multilingual, ar, bg, ca, cs, da, de, el, es, et, fa, fi, fr, gl, gu, he, hi, hr, hu, hy, id, it, ja, ka, ko, ku, lt, lv, mk, mn, mr, ms, my, nb, nl, pl, pt, ro, ru, sk, sl, sq, sr, sv, th, tr, uk, ur, vi, zh

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 128, 'do_lower_case': False}) with Transformer model: XLMRobertaModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("vallabh001/xlm-roberta-base-multilingual-en-es")
# Run inference
sentences = [
    'We need a different machine.',
    'Necesitamos una máquina diferente.',
    'Entonces, ¿dónde nos deja esto?',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Knowledge Distillation

Dataset: en-es
Evaluated with MSEEvaluator

Metric	Value
negative_mse	-10.1836

Translation

Dataset: en-es
Evaluated with TranslationEvaluator

Metric	Value
src2trg_accuracy	0.9879
trg2src_accuracy	0.9909
mean_accuracy	0.9894

Semantic Similarity

Dataset: sts17-es-en-test
Evaluated with EmbeddingSimilarityEvaluator

Metric	Value
pearson_cosine	0.7671
spearman_cosine	0.7903

Training Details

Training Dataset

en-es

Dataset: en-es at 0c70bc6
Size: 404,981 training samples
Columns: english, non_english, and label
Approximate statistics based on the first 1000 samples:
english non_english label
type string string list
details
min: 4 tokens
mean: 25.77 tokens
max: 128 tokens

min: 4 tokens
mean: 25.42 tokens
max: 128 tokens

size: 768 elements

	english	non_english	label
type	string	string	list
details	min: 4 tokens mean: 25.77 tokens max: 128 tokens	min: 4 tokens mean: 25.42 tokens max: 128 tokens	size: 768 elements

Samples:

english	non_english	label
`And then there are certain conceptual things that can also benefit from hand calculating, but I think they're relatively small in number.`	`Y luego hay ciertas aspectos conceptuales que pueden beneficiarse del cálculo a mano pero creo que son relativamente pocos.`	`[-0.59398353099823, 0.9714106321334839, 0.6800687313079834, -0.21585586667060852, -0.7509507536888123, ...]`
`One thing I often ask about is ancient Greek and how this relates.`	`Algo que pregunto a menudo es sobre el griego antiguo y cómo se relaciona.`	`[-0.09777131676673889, 0.07093200832605362, -0.42989036440849304, -0.1457505226135254, 1.4382765293121338, ...]`
`See, the thing we're doing right now is we're forcing people to learn mathematics.`	`Vean, lo que estamos haciendo ahora es forzar a la gente a aprender matemáticas.`	`[0.39432215690612793, 0.1891053169965744, -0.3788300156593323, 0.438666433095932, 0.2727019190788269, ...]`

Loss: MSELoss

Evaluation Dataset

en-es

Dataset: en-es at 0c70bc6
Size: 990 evaluation samples
Columns: english, non_english, and label
Approximate statistics based on the first 990 samples:
english non_english label
type string string list
details
min: 4 tokens
mean: 26.42 tokens
max: 128 tokens

min: 4 tokens
mean: 26.47 tokens
max: 128 tokens

size: 768 elements

	english	non_english	label
type	string	string	list
details	min: 4 tokens mean: 26.42 tokens max: 128 tokens	min: 4 tokens mean: 26.47 tokens max: 128 tokens	size: 768 elements

Samples:

english	non_english	label
`Thank you so much, Chris.`	`Muchas gracias Chris.`	`[-0.43312570452690125, 1.0602686405181885, -0.07791059464216232, -0.41704198718070984, 1.676845908164978, ...]`
`And it's truly a great honor to have the opportunity to come to this stage twice; I'm extremely grateful.`	`Y es en verdad un gran honor tener la oportunidad de venir a este escenario por segunda vez. Estoy extremadamente agradecido.`	`[0.27005693316459656, 0.5391747951507568, -0.2580487132072449, -0.6613675951957703, 0.6738824248313904, ...]`
`I have been blown away by this conference, and I want to thank all of you for the many nice comments about what I had to say the other night.`	`He quedado conmovido por esta conferencia, y deseo agradecer a todos ustedes sus amables comentarios acerca de lo que tenía que decir la otra noche.`	`[-0.2532017230987549, 0.04791336879134178, -0.1317490190267563, -0.7357572913169861, 0.23663584887981415, ...]`

Loss: MSELoss

Training Hyperparameters

Non-Default Hyperparameters

eval_strategy: steps
per_device_train_batch_size: 64
per_device_eval_batch_size: 64
learning_rate: 2e-05
num_train_epochs: 5
warmup_ratio: 0.1
bf16: True

All Hyperparameters

Click to expand

overwrite_output_dir: False
do_predict: False
eval_strategy: steps
prediction_loss_only: True
per_device_train_batch_size: 64
per_device_eval_batch_size: 64
per_gpu_train_batch_size: None
per_gpu_eval_batch_size: None
gradient_accumulation_steps: 1
eval_accumulation_steps: None
torch_empty_cache_steps: None
learning_rate: 2e-05
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1.0
num_train_epochs: 5
max_steps: -1
lr_scheduler_type: linear
lr_scheduler_kwargs: {}
warmup_ratio: 0.1
warmup_steps: 0
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
save_safetensors: True
save_on_each_node: False
save_only_model: False
restore_callback_states_from_checkpoint: False
no_cuda: False
use_cpu: False
use_mps_device: False
seed: 42
data_seed: None
jit_mode_eval: False
use_ipex: False
bf16: True
fp16: False
fp16_opt_level: O1
half_precision_backend: auto
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: 0
ddp_backend: None
tpu_num_cores: None
tpu_metrics_debug: False
debug: []
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_prefetch_factor: None
past_index: -1
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: False
ignore_data_skip: False
fsdp: []
fsdp_min_num_params: 0
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
fsdp_transformer_layer_cls_to_wrap: None
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch
optim_args: None
adafactor: False
group_by_length: False
length_column_name: length
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
use_legacy_prediction_loop: False
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: None
hub_strategy: every_save
hub_private_repo: False
hub_always_push: False
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_inputs_for_metrics: False
include_for_metrics: []
eval_do_concat_batches: True
fp16_backend: auto
push_to_hub_model_id: None
push_to_hub_organization: None
mp_parameters:
auto_find_batch_size: False
full_determinism: False
torchdynamo: None
ray_scope: last
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
dispatch_batches: None
split_batches: None
include_tokens_per_second: False
include_num_input_tokens_seen: False
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
eval_on_start: False
use_liger_kernel: False
eval_use_gather_object: False
average_tokens_across_devices: False
prompts: None
batch_sampler: batch_sampler
multi_dataset_batch_sampler: proportional

Training Logs

Click to expand

Epoch	Step	Training Loss	en-es loss	en-es_negative_mse	en-es_mean_accuracy	sts17-es-en-test_spearman_cosine
0.0158	100	0.6528	-	-	-	-
0.0316	200	0.5634	-	-	-	-
0.0474	300	0.4418	-	-	-	-
0.0632	400	0.3009	-	-	-	-
0.0790	500	0.2744	-	-	-	-
0.0948	600	0.2677	-	-	-	-
0.1106	700	0.2661	-	-	-	-
0.1264	800	0.2614	-	-	-	-
0.1422	900	0.2583	-	-	-	-
0.1580	1000	0.2582	-	-	-	-
0.1738	1100	0.2579	-	-	-	-
0.1896	1200	0.256	-	-	-	-
0.2054	1300	0.2511	-	-	-	-
0.2212	1400	0.2467	-	-	-	-
0.2370	1500	0.2423	-	-	-	-
0.2528	1600	0.2364	-	-	-	-
0.2686	1700	0.2305	-	-	-	-
0.2845	1800	0.2248	-	-	-	-
0.3003	1900	0.2184	-	-	-	-
0.3161	2000	0.2143	-	-	-	-
0.3319	2100	0.2098	-	-	-	-
0.3477	2200	0.2055	-	-	-	-
0.3635	2300	0.1999	-	-	-	-
0.3793	2400	0.1965	-	-	-	-
0.3951	2500	0.1919	-	-	-	-
0.4109	2600	0.1889	-	-	-	-
0.4267	2700	0.1858	-	-	-	-
0.4425	2800	0.1826	-	-	-	-
0.4583	2900	0.18	-	-	-	-
0.4741	3000	0.1774	-	-	-	-
0.4899	3100	0.1758	-	-	-	-
0.5057	3200	0.1738	-	-	-	-
0.5215	3300	0.1706	-	-	-	-
0.5373	3400	0.1678	-	-	-	-
0.5531	3500	0.1664	-	-	-	-
0.5689	3600	0.1647	-	-	-	-
0.5847	3700	0.163	-	-	-	-
0.6005	3800	0.1605	-	-	-	-
0.6163	3900	0.1594	-	-	-	-
0.6321	4000	0.1576	-	-	-	-
0.6479	4100	0.1561	-	-	-	-
0.6637	4200	0.1541	-	-	-	-
0.6795	4300	0.1545	-	-	-	-
0.6953	4400	0.1535	-	-	-	-
0.7111	4500	0.1523	-	-	-	-
0.7269	4600	0.1502	-	-	-	-
0.7427	4700	0.1487	-	-	-	-
0.7585	4800	0.1486	-	-	-	-
0.7743	4900	0.1477	-	-	-	-
0.7901	5000	0.1465	0.1390	-14.681906	0.9803	0.6371
0.8059	5100	0.1469	-	-	-	-
0.8217	5200	0.1449	-	-	-	-
0.8375	5300	0.1437	-	-	-	-
0.8534	5400	0.142	-	-	-	-
0.8692	5500	0.1423	-	-	-	-
0.8850	5600	0.1424	-	-	-	-
0.9008	5700	0.1415	-	-	-	-
0.9166	5800	0.1407	-	-	-	-
0.9324	5900	0.1396	-	-	-	-
0.9482	6000	0.1388	-	-	-	-
0.9640	6100	0.1391	-	-	-	-
0.9798	6200	0.1368	-	-	-	-
0.9956	6300	0.1366	-	-	-	-
1.0114	6400	0.1367	-	-	-	-
1.0272	6500	0.1343	-	-	-	-
1.0430	6600	0.1341	-	-	-	-
1.0588	6700	0.1349	-	-	-	-
1.0746	6800	0.1327	-	-	-	-
1.0904	6900	0.1334	-	-	-	-
1.1062	7000	0.133	-	-	-	-
1.1220	7100	0.1316	-	-	-	-
1.1378	7200	0.1308	-	-	-	-
1.1536	7300	0.1316	-	-	-	-
1.1694	7400	0.1298	-	-	-	-
1.1852	7500	0.1294	-	-	-	-
1.2010	7600	0.1295	-	-	-	-
1.2168	7700	0.13	-	-	-	-
1.2326	7800	0.1285	-	-	-	-
1.2484	7900	0.1278	-	-	-	-
1.2642	8000	0.1272	-	-	-	-
1.2800	8100	0.1262	-	-	-	-
1.2958	8200	0.1275	-	-	-	-
1.3116	8300	0.1266	-	-	-	-
1.3274	8400	0.1252	-	-	-	-
1.3432	8500	0.1256	-	-	-	-
1.3590	8600	0.1246	-	-	-	-
1.3748	8700	0.1254	-	-	-	-
1.3906	8800	0.1242	-	-	-	-
1.4064	8900	0.1249	-	-	-	-
1.4223	9000	0.1233	-	-	-	-
1.4381	9100	0.1238	-	-	-	-
1.4539	9200	0.1231	-	-	-	-
1.4697	9300	0.122	-	-	-	-
1.4855	9400	0.1217	-	-	-	-
1.5013	9500	0.1225	-	-	-	-
1.5171	9600	0.1213	-	-	-	-
1.5329	9700	0.1208	-	-	-	-
1.5487	9800	0.1214	-	-	-	-
1.5645	9900	0.1205	-	-	-	-
1.5803	10000	0.12	0.1120	-12.20076	0.9843	0.7137
1.5961	10100	0.1205	-	-	-	-
1.6119	10200	0.12	-	-	-	-
1.6277	10300	0.1187	-	-	-	-
1.6435	10400	0.1184	-	-	-	-
1.6593	10500	0.1178	-	-	-	-
1.6751	10600	0.1188	-	-	-	-
1.6909	10700	0.1184	-	-	-	-
1.7067	10800	0.1168	-	-	-	-
1.7225	10900	0.1175	-	-	-	-
1.7383	11000	0.1158	-	-	-	-
1.7541	11100	0.1159	-	-	-	-
1.7699	11200	0.1178	-	-	-	-
1.7857	11300	0.1158	-	-	-	-
1.8015	11400	0.1161	-	-	-	-
1.8173	11500	0.1151	-	-	-	-
1.8331	11600	0.1147	-	-	-	-
1.8489	11700	0.1152	-	-	-	-
1.8647	11800	0.1144	-	-	-	-
1.8805	11900	0.1145	-	-	-	-
1.8963	12000	0.1144	-	-	-	-
1.9121	12100	0.1139	-	-	-	-
1.9279	12200	0.1144	-	-	-	-
1.9437	12300	0.1144	-	-	-	-
1.9595	12400	0.1124	-	-	-	-
1.9753	12500	0.1134	-	-	-	-
1.9912	12600	0.1133	-	-	-	-
2.0070	12700	0.1125	-	-	-	-
2.0228	12800	0.1108	-	-	-	-
2.0386	12900	0.1112	-	-	-	-
2.0544	13000	0.1109	-	-	-	-
2.0702	13100	0.1105	-	-	-	-
2.0860	13200	0.1112	-	-	-	-
2.1018	13300	0.1105	-	-	-	-
2.1176	13400	0.1105	-	-	-	-
2.1334	13500	0.11	-	-	-	-
2.1492	13600	0.1096	-	-	-	-
2.1650	13700	0.1098	-	-	-	-
2.1808	13800	0.1093	-	-	-	-
2.1966	13900	0.1089	-	-	-	-
2.2124	14000	0.1091	-	-	-	-
2.2282	14100	0.1091	-	-	-	-
2.2440	14200	0.1086	-	-	-	-
2.2598	14300	0.1089	-	-	-	-
2.2756	14400	0.1087	-	-	-	-
2.2914	14500	0.1083	-	-	-	-
2.3072	14600	0.1091	-	-	-	-
2.3230	14700	0.1083	-	-	-	-
2.3388	14800	0.1088	-	-	-	-
2.3546	14900	0.1071	-	-	-	-
2.3704	15000	0.1085	0.1015	-11.243325	0.9843	0.7625
2.3862	15100	0.1077	-	-	-	-
2.4020	15200	0.1076	-	-	-	-
2.4178	15300	0.108	-	-	-	-
2.4336	15400	0.1066	-	-	-	-
2.4494	15500	0.1062	-	-	-	-
2.4652	15600	0.1065	-	-	-	-
2.4810	15700	0.1058	-	-	-	-
2.4968	15800	0.1071	-	-	-	-
2.5126	15900	0.1071	-	-	-	-
2.5284	16000	0.1066	-	-	-	-
2.5442	16100	0.1067	-	-	-	-
2.5601	16200	0.1057	-	-	-	-
2.5759	16300	0.106	-	-	-	-
2.5917	16400	0.1061	-	-	-	-
2.6075	16500	0.1047	-	-	-	-
2.6233	16600	0.1057	-	-	-	-
2.6391	16700	0.106	-	-	-	-
2.6549	16800	0.1055	-	-	-	-
2.6707	16900	0.105	-	-	-	-
2.6865	17000	0.1047	-	-	-	-
2.7023	17100	0.1042	-	-	-	-
2.7181	17200	0.1057	-	-	-	-
2.7339	17300	0.1051	-	-	-	-
2.7497	17400	0.1055	-	-	-	-
2.7655	17500	0.1047	-	-	-	-
2.7813	17600	0.1043	-	-	-	-
2.7971	17700	0.1034	-	-	-	-
2.8129	17800	0.1039	-	-	-	-
2.8287	17900	0.1038	-	-	-	-
2.8445	18000	0.1032	-	-	-	-
2.8603	18100	0.103	-	-	-	-
2.8761	18200	0.1035	-	-	-	-
2.8919	18300	0.1024	-	-	-	-
2.9077	18400	0.1032	-	-	-	-
2.9235	18500	0.1031	-	-	-	-
2.9393	18600	0.1034	-	-	-	-
2.9551	18700	0.1033	-	-	-	-
2.9709	18800	0.1036	-	-	-	-
2.9867	18900	0.1029	-	-	-	-
3.0025	19000	0.1024	-	-	-	-
3.0183	19100	0.1017	-	-	-	-
3.0341	19200	0.1012	-	-	-	-
3.0499	19300	0.1016	-	-	-	-
3.0657	19400	0.1012	-	-	-	-
3.0815	19500	0.1009	-	-	-	-
3.0973	19600	0.1015	-	-	-	-
3.1131	19700	0.1014	-	-	-	-
3.1290	19800	0.1004	-	-	-	-
3.1448	19900	0.1011	-	-	-	-
3.1606	20000	0.1006	0.0952	-10.662492	0.9879	0.7811
3.1764	20100	0.1007	-	-	-	-
3.1922	20200	0.1015	-	-	-	-
3.2080	20300	0.1005	-	-	-	-
3.2238	20400	0.1017	-	-	-	-
3.2396	20500	0.1012	-	-	-	-
3.2554	20600	0.0998	-	-	-	-
3.2712	20700	0.0997	-	-	-	-
3.2870	20800	0.1001	-	-	-	-
3.3028	20900	0.1009	-	-	-	-
3.3186	21000	0.1	-	-	-	-
3.3344	21100	0.1001	-	-	-	-
3.3502	21200	0.1008	-	-	-	-
3.3660	21300	0.0996	-	-	-	-
3.3818	21400	0.0993	-	-	-	-
3.3976	21500	0.1004	-	-	-	-
3.4134	21600	0.0996	-	-	-	-
3.4292	21700	0.0993	-	-	-	-
3.4450	21800	0.0997	-	-	-	-
3.4608	21900	0.0997	-	-	-	-
3.4766	22000	0.0997	-	-	-	-
3.4924	22100	0.0984	-	-	-	-
3.5082	22200	0.0999	-	-	-	-
3.5240	22300	0.099	-	-	-	-
3.5398	22400	0.0992	-	-	-	-
3.5556	22500	0.0988	-	-	-	-
3.5714	22600	0.0989	-	-	-	-
3.5872	22700	0.0989	-	-	-	-
3.6030	22800	0.0978	-	-	-	-
3.6188	22900	0.0987	-	-	-	-
3.6346	23000	0.0997	-	-	-	-
3.6504	23100	0.0994	-	-	-	-
3.6662	23200	0.0984	-	-	-	-
3.6820	23300	0.0985	-	-	-	-
3.6979	23400	0.0983	-	-	-	-
3.7137	23500	0.0992	-	-	-	-
3.7295	23600	0.0983	-	-	-	-
3.7453	23700	0.0987	-	-	-	-
3.7611	23800	0.0983	-	-	-	-
3.7769	23900	0.0969	-	-	-	-
3.7927	24000	0.0984	-	-	-	-
3.8085	24100	0.0976	-	-	-	-
3.8243	24200	0.0984	-	-	-	-
3.8401	24300	0.0974	-	-	-	-
3.8559	24400	0.0982	-	-	-	-
3.8717	24500	0.0983	-	-	-	-
3.8875	24600	0.0986	-	-	-	-
3.9033	24700	0.0977	-	-	-	-
3.9191	24800	0.0974	-	-	-	-
3.9349	24900	0.0979	-	-	-	-
3.9507	25000	0.0974	0.0916	-10.330441	0.9904	0.7840
3.9665	25100	0.0974	-	-	-	-
3.9823	25200	0.097	-	-	-	-
3.9981	25300	0.0978	-	-	-	-
4.0139	25400	0.0969	-	-	-	-
4.0297	25500	0.0966	-	-	-	-
4.0455	25600	0.0965	-	-	-	-
4.0613	25700	0.0974	-	-	-	-
4.0771	25800	0.0966	-	-	-	-
4.0929	25900	0.0964	-	-	-	-
4.1087	26000	0.0961	-	-	-	-
4.1245	26100	0.0958	-	-	-	-
4.1403	26200	0.0964	-	-	-	-
4.1561	26300	0.097	-	-	-	-
4.1719	26400	0.0967	-	-	-	-
4.1877	26500	0.0968	-	-	-	-
4.2035	26600	0.0965	-	-	-	-
4.2193	26700	0.0956	-	-	-	-
4.2351	26800	0.0963	-	-	-	-
4.2509	26900	0.0958	-	-	-	-
4.2668	27000	0.0969	-	-	-	-
4.2826	27100	0.0951	-	-	-	-
4.2984	27200	0.0958	-	-	-	-
4.3142	27300	0.0956	-	-	-	-
4.3300	27400	0.0965	-	-	-	-
4.3458	27500	0.0952	-	-	-	-
4.3616	27600	0.0956	-	-	-	-
4.3774	27700	0.0956	-	-	-	-
4.3932	27800	0.0966	-	-	-	-
4.4090	27900	0.0972	-	-	-	-
4.4248	28000	0.0954	-	-	-	-
4.4406	28100	0.0961	-	-	-	-
4.4564	28200	0.0963	-	-	-	-
4.4722	28300	0.0958	-	-	-	-
4.4880	28400	0.0961	-	-	-	-
4.5038	28500	0.0961	-	-	-	-
4.5196	28600	0.0956	-	-	-	-
4.5354	28700	0.0955	-	-	-	-
4.5512	28800	0.0957	-	-	-	-
4.5670	28900	0.0953	-	-	-	-
4.5828	29000	0.0952	-	-	-	-
4.5986	29100	0.0964	-	-	-	-
4.6144	29200	0.0955	-	-	-	-
4.6302	29300	0.0948	-	-	-	-
4.6460	29400	0.0946	-	-	-	-
4.6618	29500	0.0953	-	-	-	-
4.6776	29600	0.0954	-	-	-	-
4.6934	29700	0.0956	-	-	-	-
4.7092	29800	0.0958	-	-	-	-
4.7250	29900	0.0956	-	-	-	-
4.7408	30000	0.0962	0.0900	-10.183619	0.9894	0.7903
4.7566	30100	0.0953	-	-	-	-
4.7724	30200	0.0959	-	-	-	-
4.7882	30300	0.0949	-	-	-	-
4.8040	30400	0.0958	-	-	-	-
4.8198	30500	0.0952	-	-	-	-
4.8357	30600	0.0952	-	-	-	-
4.8515	30700	0.095	-	-	-	-
4.8673	30800	0.0949	-	-	-	-
4.8831	30900	0.0949	-	-	-	-
4.8989	31000	0.0953	-	-	-	-
4.9147	31100	0.0955	-	-	-	-
4.9305	31200	0.0964	-	-	-	-
4.9463	31300	0.0955	-	-	-	-
4.9621	31400	0.0955	-	-	-	-
4.9779	31500	0.0954	-	-	-	-
4.9937	31600	0.0959	-	-	-	-

Framework Versions

Python: 3.10.12
Sentence Transformers: 3.3.1
Transformers: 4.46.3
PyTorch: 2.5.1+cu124
Accelerate: 1.2.1
Datasets: 3.2.0
Tokenizers: 0.20.3

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MSELoss

@inproceedings{reimers-2020-multilingual-sentence-bert,
    title = "Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2020",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/2004.09813",
}

vallabh001
/

xlm-roberta-base-multilingual-en-es

SentenceTransformer based on FacebookAI/xlm-roberta-base

Model Details

Model Description

Model Sources

Full Model Architecture

Usage

Direct Usage (Sentence Transformers)

Evaluation

Metrics

Knowledge Distillation

Translation

Semantic Similarity

Training Details

Training Dataset

en-es

Evaluation Dataset

en-es

Training Hyperparameters

Non-Default Hyperparameters

All Hyperparameters

Training Logs

Framework Versions

Citation

BibTeX

Sentence Transformers

MSELoss

Model tree for vallabh001/xlm-roberta-base-multilingual-en-es

Dataset used to train vallabh001/xlm-roberta-base-multilingual-en-es

Evaluation results