tomaarsen HF staff commited on
Commit
ae27081
·
verified ·
1 Parent(s): 86e09ad

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +404 -136
README.md CHANGED
@@ -1,137 +1,405 @@
1
- ---
2
- tags:
3
- - sentence-transformers
4
- - cross-encoder
5
- - text-classification
6
- pipeline_tag: text-classification
7
- library_name: sentence-transformers
8
- ---
9
-
10
- # CrossEncoder
11
-
12
- This is a [Cross Encoder](https://www.sbert.net/docs/cross_encoder/usage/usage.html) model trained using the [sentence-transformers](https://www.SBERT.net) library. It computes scores for pairs of texts, which can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
13
-
14
- ## Model Details
15
-
16
- ### Model Description
17
- - **Model Type:** Cross Encoder
18
- <!-- - **Base model:** [Unknown](https://huggingface.co/unknown) -->
19
- - **Maximum Sequence Length:** 8192 tokens
20
- - **Number of Output Labels:** 1 label
21
- <!-- - **Training Dataset:** Unknown -->
22
- <!-- - **Language:** Unknown -->
23
- <!-- - **License:** Unknown -->
24
-
25
- ### Model Sources
26
-
27
- - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
28
- - **Documentation:** [Cross Encoder Documentation](https://www.sbert.net/docs/cross_encoder/usage/usage.html)
29
- - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
30
- - **Hugging Face:** [Cross Encoders on Hugging Face](https://huggingface.co/models?library=sentence-transformers&other=cross-encoder)
31
-
32
- ## Usage
33
-
34
- ### Direct Usage (Sentence Transformers)
35
-
36
- First install the Sentence Transformers library:
37
-
38
- ```bash
39
- pip install -U sentence-transformers
40
- ```
41
-
42
- Then you can load this model and run inference.
43
- ```python
44
- from sentence_transformers import CrossEncoder
45
-
46
- # Download from the 🤗 Hub
47
- model = CrossEncoder("tomaarsen/reranker-ModernBERT-base-gooaq-bce-static-retriever-hardest")
48
- # Get scores for pairs of texts
49
- pairs = [
50
- ['How many calories in an egg', 'There are on average between 55 and 80 calories in an egg depending on its size.'],
51
- ['How many calories in an egg', 'Egg whites are very low in calories, have no fat, no cholesterol, and are loaded with protein.'],
52
- ['How many calories in an egg', 'Most of the calories in an egg come from the yellow yolk in the center.'],
53
- ]
54
- scores = model.predict(pairs)
55
- print(scores.shape)
56
- # (3,)
57
-
58
- # Or rank different texts based on similarity to a single text
59
- ranks = model.rank(
60
- 'How many calories in an egg',
61
- [
62
- 'There are on average between 55 and 80 calories in an egg depending on its size.',
63
- 'Egg whites are very low in calories, have no fat, no cholesterol, and are loaded with protein.',
64
- 'Most of the calories in an egg come from the yellow yolk in the center.',
65
- ]
66
- )
67
- # [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]
68
- ```
69
-
70
- <!--
71
- ### Direct Usage (Transformers)
72
-
73
- <details><summary>Click to see the direct usage in Transformers</summary>
74
-
75
- </details>
76
- -->
77
-
78
- <!--
79
- ### Downstream Usage (Sentence Transformers)
80
-
81
- You can finetune this model on your own dataset.
82
-
83
- <details><summary>Click to expand</summary>
84
-
85
- </details>
86
- -->
87
-
88
- <!--
89
- ### Out-of-Scope Use
90
-
91
- *List how the model may foreseeably be misused and address what users ought not to do with the model.*
92
- -->
93
-
94
- <!--
95
- ## Bias, Risks and Limitations
96
-
97
- *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
98
- -->
99
-
100
- <!--
101
- ### Recommendations
102
-
103
- *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
104
- -->
105
-
106
- ## Training Details
107
-
108
- ### Framework Versions
109
- - Python: 3.11.6
110
- - Sentence Transformers: 3.5.0.dev0
111
- - Transformers: 4.48.3
112
- - PyTorch: 2.5.0+cu121
113
- - Accelerate: 1.3.0
114
- - Datasets: 2.20.0
115
- - Tokenizers: 0.21.0
116
-
117
- ## Citation
118
-
119
- ### BibTeX
120
-
121
- <!--
122
- ## Glossary
123
-
124
- *Clearly define terms in order to be accessible across audiences.*
125
- -->
126
-
127
- <!--
128
- ## Model Card Authors
129
-
130
- *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
131
- -->
132
-
133
- <!--
134
- ## Model Card Contact
135
-
136
- *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
137
  -->
 
1
+ ---
2
+ tags:
3
+ - sentence-transformers
4
+ - cross-encoder
5
+ - text-classification
6
+ - generated_from_trainer
7
+ - dataset_size:578402
8
+ - loss:BinaryCrossEntropyLoss
9
+ base_model: answerdotai/ModernBERT-base
10
+ pipeline_tag: text-classification
11
+ library_name: sentence-transformers
12
+ metrics:
13
+ - map
14
+ - mrr@10
15
+ - ndcg@10
16
+ model-index:
17
+ - name: CrossEncoder based on answerdotai/ModernBERT-base
18
+ results: []
19
+ ---
20
+
21
+ # CrossEncoder based on answerdotai/ModernBERT-base
22
+
23
+ This is a [Cross Encoder](https://www.sbert.net/docs/cross_encoder/usage/usage.html) model finetuned from [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base) using the [sentence-transformers](https://www.SBERT.net) library. It computes scores for pairs of texts, which can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
24
+
25
+ ## Model Details
26
+
27
+ ### Model Description
28
+ - **Model Type:** Cross Encoder
29
+ - **Base model:** [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base) <!-- at revision 8949b909ec900327062f0ebf497f51aef5e6f0c8 -->
30
+ - **Maximum Sequence Length:** 8192 tokens
31
+ - **Number of Output Labels:** 1 label
32
+ <!-- - **Training Dataset:** Unknown -->
33
+ <!-- - **Language:** Unknown -->
34
+ <!-- - **License:** Unknown -->
35
+
36
+ ### Model Sources
37
+
38
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
39
+ - **Documentation:** [Cross Encoder Documentation](https://www.sbert.net/docs/cross_encoder/usage/usage.html)
40
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
41
+ - **Hugging Face:** [Cross Encoders on Hugging Face](https://huggingface.co/models?library=sentence-transformers&other=cross-encoder)
42
+
43
+ ## Usage
44
+
45
+ ### Direct Usage (Sentence Transformers)
46
+
47
+ First install the Sentence Transformers library:
48
+
49
+ ```bash
50
+ pip install -U sentence-transformers
51
+ ```
52
+
53
+ Then you can load this model and run inference.
54
+ ```python
55
+ from sentence_transformers import CrossEncoder
56
+
57
+ # Download from the 🤗 Hub
58
+ model = CrossEncoder("sentence_transformers_model_id")
59
+ # Get scores for pairs of texts
60
+ pairs = [
61
+ ["how to obtain a teacher's certificate in texas?", '["Step 1: Obtain a Bachelor\'s Degree. One of the most important Texas teacher qualifications is a bachelor\'s degree. ... ", \'Step 2: Complete an Educator Preparation Program (EPP) ... \', \'Step 3: Pass Texas Teacher Certification Exams. ... \', \'Step 4: Complete a Final Application and Background Check.\']'],
62
+ ["how to obtain a teacher's certificate in texas?", 'Teacher education programs may take 4 years to complete after which certification plans are prepared for a three year period. During this plan period, the teacher must obtain a Standard Certification within 1-2 years. Learn how to get certified to teach in Texas.'],
63
+ ["how to obtain a teacher's certificate in texas?", "Washington Teachers Licensing Application Process Official transcripts showing proof of bachelor's degree. Proof of teacher program completion at an approved teacher preparation school. Passing scores on the required examinations. Completed application for teacher certification in Washington."],
64
+ ["how to obtain a teacher's certificate in texas?", 'Some aspiring educators may be confused about the difference between teaching certification and teaching certificates. Teacher certification is another term for the licensure required to teach in public schools, while a teaching certificate is awarded upon completion of an academic program.'],
65
+ ["how to obtain a teacher's certificate in texas?", 'In Texas, the minimum age to work is 14. Unlike some states, Texas does not require juvenile workers to obtain a child employment certificate or an age certificate to work. A prospective employer that wants one can request a certificate of age for any minors it employs, obtainable from the Texas Workforce Commission.'],
66
+ ]
67
+ scores = model.predict(pairs)
68
+ print(scores.shape)
69
+ # (5,)
70
+
71
+ # Or rank different texts based on similarity to a single text
72
+ ranks = model.rank(
73
+ "how to obtain a teacher's certificate in texas?",
74
+ [
75
+ '["Step 1: Obtain a Bachelor\'s Degree. One of the most important Texas teacher qualifications is a bachelor\'s degree. ... ", \'Step 2: Complete an Educator Preparation Program (EPP) ... \', \'Step 3: Pass Texas Teacher Certification Exams. ... \', \'Step 4: Complete a Final Application and Background Check.\']',
76
+ 'Teacher education programs may take 4 years to complete after which certification plans are prepared for a three year period. During this plan period, the teacher must obtain a Standard Certification within 1-2 years. Learn how to get certified to teach in Texas.',
77
+ "Washington Teachers Licensing Application Process Official transcripts showing proof of bachelor's degree. Proof of teacher program completion at an approved teacher preparation school. Passing scores on the required examinations. Completed application for teacher certification in Washington.",
78
+ 'Some aspiring educators may be confused about the difference between teaching certification and teaching certificates. Teacher certification is another term for the licensure required to teach in public schools, while a teaching certificate is awarded upon completion of an academic program.',
79
+ 'In Texas, the minimum age to work is 14. Unlike some states, Texas does not require juvenile workers to obtain a child employment certificate or an age certificate to work. A prospective employer that wants one can request a certificate of age for any minors it employs, obtainable from the Texas Workforce Commission.',
80
+ ]
81
+ )
82
+ # [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]
83
+ ```
84
+
85
+ <!--
86
+ ### Direct Usage (Transformers)
87
+
88
+ <details><summary>Click to see the direct usage in Transformers</summary>
89
+
90
+ </details>
91
+ -->
92
+
93
+ <!--
94
+ ### Downstream Usage (Sentence Transformers)
95
+
96
+ You can finetune this model on your own dataset.
97
+
98
+ <details><summary>Click to expand</summary>
99
+
100
+ </details>
101
+ -->
102
+
103
+ <!--
104
+ ### Out-of-Scope Use
105
+
106
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
107
+ -->
108
+
109
+ ## Evaluation
110
+
111
+ ### Metrics
112
+
113
+ #### Cross Encoder Reranking
114
+
115
+ * Datasets: `gooaq-dev`, `NanoMSMARCO`, `NanoNFCorpus` and `NanoNQ`
116
+ * Evaluated with [<code>CERerankingEvaluator</code>](https://sbert.net/docs/package_reference/cross_encoder/evaluation.html#sentence_transformers.cross_encoder.evaluation.CERerankingEvaluator)
117
+
118
+ | Metric | gooaq-dev | NanoMSMARCO | NanoNFCorpus | NanoNQ |
119
+ |:------------|:---------------------|:---------------------|:---------------------|:---------------------|
120
+ | map | 0.7821 (+0.2485) | 0.4373 (-0.0523) | 0.3354 (+0.0650) | 0.5305 (+0.1098) |
121
+ | mrr@10 | 0.7800 (+0.2560) | 0.4288 (-0.0487) | 0.4934 (-0.0064) | 0.5326 (+0.1059) |
122
+ | **ndcg@10** | **0.8269 (+0.2356)** | **0.5287 (-0.0117)** | **0.3612 (+0.0361)** | **0.5823 (+0.0817)** |
123
+
124
+ #### Cross Encoder Nano BEIR
125
+
126
+ * Dataset: `NanoBEIR_mean`
127
+ * Evaluated with [<code>CENanoBEIREvaluator</code>](https://sbert.net/docs/package_reference/cross_encoder/evaluation.html#sentence_transformers.cross_encoder.evaluation.CENanoBEIREvaluator)
128
+
129
+ | Metric | Value |
130
+ |:------------|:---------------------|
131
+ | map | 0.4344 (+0.0408) |
132
+ | mrr@10 | 0.4849 (+0.0169) |
133
+ | **ndcg@10** | **0.4907 (+0.0354)** |
134
+
135
+ <!--
136
+ ## Bias, Risks and Limitations
137
+
138
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
139
+ -->
140
+
141
+ <!--
142
+ ### Recommendations
143
+
144
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
145
+ -->
146
+
147
+ ## Training Details
148
+
149
+ ### Training Dataset
150
+
151
+ #### Unnamed Dataset
152
+
153
+ * Size: 578,402 training samples
154
+ * Columns: <code>question</code>, <code>answer</code>, and <code>label</code>
155
+ * Approximate statistics based on the first 1000 samples:
156
+ | | question | answer | label |
157
+ |:--------|:-----------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------|:------------------------------------------------|
158
+ | type | string | string | int |
159
+ | details | <ul><li>min: 19 characters</li><li>mean: 43.6 characters</li><li>max: 100 characters</li></ul> | <ul><li>min: 56 characters</li><li>mean: 251.22 characters</li><li>max: 387 characters</li></ul> | <ul><li>0: ~82.90%</li><li>1: ~17.10%</li></ul> |
160
+ * Samples:
161
+ | question | answer | label |
162
+ |:-------------------------------------------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------|
163
+ | <code>how to obtain a teacher's certificate in texas?</code> | <code>["Step 1: Obtain a Bachelor's Degree. One of the most important Texas teacher qualifications is a bachelor's degree. ... ", 'Step 2: Complete an Educator Preparation Program (EPP) ... ', 'Step 3: Pass Texas Teacher Certification Exams. ... ', 'Step 4: Complete a Final Application and Background Check.']</code> | <code>1</code> |
164
+ | <code>how to obtain a teacher's certificate in texas?</code> | <code>Teacher education programs may take 4 years to complete after which certification plans are prepared for a three year period. During this plan period, the teacher must obtain a Standard Certification within 1-2 years. Learn how to get certified to teach in Texas.</code> | <code>0</code> |
165
+ | <code>how to obtain a teacher's certificate in texas?</code> | <code>Washington Teachers Licensing Application Process Official transcripts showing proof of bachelor's degree. Proof of teacher program completion at an approved teacher preparation school. Passing scores on the required examinations. Completed application for teacher certification in Washington.</code> | <code>0</code> |
166
+ * Loss: [<code>BinaryCrossEntropyLoss</code>](https://sbert.net/docs/package_reference/cross_encoder/losses.html#binarycrossentropyloss) with these parameters:
167
+ ```json
168
+ {
169
+ "activation_fct": "torch.nn.modules.linear.Identity",
170
+ "pos_weight": 5
171
+ }
172
+ ```
173
+
174
+ ### Training Hyperparameters
175
+ #### Non-Default Hyperparameters
176
+
177
+ - `eval_strategy`: steps
178
+ - `per_device_train_batch_size`: 64
179
+ - `per_device_eval_batch_size`: 64
180
+ - `learning_rate`: 2e-05
181
+ - `num_train_epochs`: 1
182
+ - `warmup_ratio`: 0.1
183
+ - `seed`: 12
184
+ - `bf16`: True
185
+ - `dataloader_num_workers`: 4
186
+ - `load_best_model_at_end`: True
187
+
188
+ #### All Hyperparameters
189
+ <details><summary>Click to expand</summary>
190
+
191
+ - `overwrite_output_dir`: False
192
+ - `do_predict`: False
193
+ - `eval_strategy`: steps
194
+ - `prediction_loss_only`: True
195
+ - `per_device_train_batch_size`: 64
196
+ - `per_device_eval_batch_size`: 64
197
+ - `per_gpu_train_batch_size`: None
198
+ - `per_gpu_eval_batch_size`: None
199
+ - `gradient_accumulation_steps`: 1
200
+ - `eval_accumulation_steps`: None
201
+ - `torch_empty_cache_steps`: None
202
+ - `learning_rate`: 2e-05
203
+ - `weight_decay`: 0.0
204
+ - `adam_beta1`: 0.9
205
+ - `adam_beta2`: 0.999
206
+ - `adam_epsilon`: 1e-08
207
+ - `max_grad_norm`: 1.0
208
+ - `num_train_epochs`: 1
209
+ - `max_steps`: -1
210
+ - `lr_scheduler_type`: linear
211
+ - `lr_scheduler_kwargs`: {}
212
+ - `warmup_ratio`: 0.1
213
+ - `warmup_steps`: 0
214
+ - `log_level`: passive
215
+ - `log_level_replica`: warning
216
+ - `log_on_each_node`: True
217
+ - `logging_nan_inf_filter`: True
218
+ - `save_safetensors`: True
219
+ - `save_on_each_node`: False
220
+ - `save_only_model`: False
221
+ - `restore_callback_states_from_checkpoint`: False
222
+ - `no_cuda`: False
223
+ - `use_cpu`: False
224
+ - `use_mps_device`: False
225
+ - `seed`: 12
226
+ - `data_seed`: None
227
+ - `jit_mode_eval`: False
228
+ - `use_ipex`: False
229
+ - `bf16`: True
230
+ - `fp16`: False
231
+ - `fp16_opt_level`: O1
232
+ - `half_precision_backend`: auto
233
+ - `bf16_full_eval`: False
234
+ - `fp16_full_eval`: False
235
+ - `tf32`: None
236
+ - `local_rank`: 0
237
+ - `ddp_backend`: None
238
+ - `tpu_num_cores`: None
239
+ - `tpu_metrics_debug`: False
240
+ - `debug`: []
241
+ - `dataloader_drop_last`: False
242
+ - `dataloader_num_workers`: 4
243
+ - `dataloader_prefetch_factor`: None
244
+ - `past_index`: -1
245
+ - `disable_tqdm`: False
246
+ - `remove_unused_columns`: True
247
+ - `label_names`: None
248
+ - `load_best_model_at_end`: True
249
+ - `ignore_data_skip`: False
250
+ - `fsdp`: []
251
+ - `fsdp_min_num_params`: 0
252
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
253
+ - `fsdp_transformer_layer_cls_to_wrap`: None
254
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
255
+ - `deepspeed`: None
256
+ - `label_smoothing_factor`: 0.0
257
+ - `optim`: adamw_torch
258
+ - `optim_args`: None
259
+ - `adafactor`: False
260
+ - `group_by_length`: False
261
+ - `length_column_name`: length
262
+ - `ddp_find_unused_parameters`: None
263
+ - `ddp_bucket_cap_mb`: None
264
+ - `ddp_broadcast_buffers`: False
265
+ - `dataloader_pin_memory`: True
266
+ - `dataloader_persistent_workers`: False
267
+ - `skip_memory_metrics`: True
268
+ - `use_legacy_prediction_loop`: False
269
+ - `push_to_hub`: False
270
+ - `resume_from_checkpoint`: None
271
+ - `hub_model_id`: None
272
+ - `hub_strategy`: every_save
273
+ - `hub_private_repo`: None
274
+ - `hub_always_push`: False
275
+ - `gradient_checkpointing`: False
276
+ - `gradient_checkpointing_kwargs`: None
277
+ - `include_inputs_for_metrics`: False
278
+ - `include_for_metrics`: []
279
+ - `eval_do_concat_batches`: True
280
+ - `fp16_backend`: auto
281
+ - `push_to_hub_model_id`: None
282
+ - `push_to_hub_organization`: None
283
+ - `mp_parameters`:
284
+ - `auto_find_batch_size`: False
285
+ - `full_determinism`: False
286
+ - `torchdynamo`: None
287
+ - `ray_scope`: last
288
+ - `ddp_timeout`: 1800
289
+ - `torch_compile`: False
290
+ - `torch_compile_backend`: None
291
+ - `torch_compile_mode`: None
292
+ - `dispatch_batches`: None
293
+ - `split_batches`: None
294
+ - `include_tokens_per_second`: False
295
+ - `include_num_input_tokens_seen`: False
296
+ - `neftune_noise_alpha`: None
297
+ - `optim_target_modules`: None
298
+ - `batch_eval_metrics`: False
299
+ - `eval_on_start`: False
300
+ - `use_liger_kernel`: False
301
+ - `eval_use_gather_object`: False
302
+ - `average_tokens_across_devices`: False
303
+ - `prompts`: None
304
+ - `batch_sampler`: batch_sampler
305
+ - `multi_dataset_batch_sampler`: proportional
306
+
307
+ </details>
308
+
309
+ ### Training Logs
310
+ | Epoch | Step | Training Loss | gooaq-dev_ndcg@10 | NanoMSMARCO_ndcg@10 | NanoNFCorpus_ndcg@10 | NanoNQ_ndcg@10 | NanoBEIR_mean_ndcg@10 |
311
+ |:----------:|:--------:|:-------------:|:--------------------:|:--------------------:|:--------------------:|:--------------------:|:---------------------:|
312
+ | -1 | -1 | - | 0.1541 (-0.4371) | 0.0273 (-0.5131) | 0.3068 (-0.0182) | 0.0340 (-0.4666) | 0.1227 (-0.3326) |
313
+ | 0.0001 | 1 | 1.3693 | - | - | - | - | - |
314
+ | 0.0221 | 200 | 1.1942 | - | - | - | - | - |
315
+ | 0.0443 | 400 | 1.1542 | - | - | - | - | - |
316
+ | 0.0664 | 600 | 0.9421 | - | - | - | - | - |
317
+ | 0.0885 | 800 | 0.7253 | - | - | - | - | - |
318
+ | 0.1106 | 1000 | 0.6955 | 0.7578 (+0.1666) | 0.4930 (-0.0474) | 0.3038 (-0.0212) | 0.6047 (+0.1040) | 0.4672 (+0.0118) |
319
+ | 0.1328 | 1200 | 0.6236 | - | - | - | - | - |
320
+ | 0.1549 | 1400 | 0.6155 | - | - | - | - | - |
321
+ | 0.1770 | 1600 | 0.6102 | - | - | - | - | - |
322
+ | 0.1992 | 1800 | 0.5621 | - | - | - | - | - |
323
+ | 0.2213 | 2000 | 0.571 | 0.7910 (+0.1998) | 0.5230 (-0.0174) | 0.3468 (+0.0217) | 0.5689 (+0.0683) | 0.4796 (+0.0242) |
324
+ | 0.2434 | 2200 | 0.5575 | - | - | - | - | - |
325
+ | 0.2655 | 2400 | 0.5539 | - | - | - | - | - |
326
+ | 0.2877 | 2600 | 0.5507 | - | - | - | - | - |
327
+ | 0.3098 | 2800 | 0.5483 | - | - | - | - | - |
328
+ | 0.3319 | 3000 | 0.5204 | 0.8089 (+0.2177) | 0.5283 (-0.0121) | 0.3413 (+0.0162) | 0.5783 (+0.0776) | 0.4826 (+0.0272) |
329
+ | 0.3541 | 3200 | 0.5267 | - | - | - | - | - |
330
+ | 0.3762 | 3400 | 0.5075 | - | - | - | - | - |
331
+ | 0.3983 | 3600 | 0.5312 | - | - | - | - | - |
332
+ | 0.4204 | 3800 | 0.4992 | - | - | - | - | - |
333
+ | 0.4426 | 4000 | 0.5019 | 0.8119 (+0.2207) | 0.5021 (-0.0383) | 0.3405 (+0.0155) | 0.5255 (+0.0249) | 0.4561 (+0.0007) |
334
+ | 0.4647 | 4200 | 0.4957 | - | - | - | - | - |
335
+ | 0.4868 | 4400 | 0.5112 | - | - | - | - | - |
336
+ | 0.5090 | 4600 | 0.4992 | - | - | - | - | - |
337
+ | 0.5311 | 4800 | 0.4767 | - | - | - | - | - |
338
+ | 0.5532 | 5000 | 0.4854 | 0.8197 (+0.2284) | 0.5562 (+0.0158) | 0.3506 (+0.0256) | 0.5767 (+0.0761) | 0.4945 (+0.0392) |
339
+ | 0.5753 | 5200 | 0.4834 | - | - | - | - | - |
340
+ | 0.5975 | 5400 | 0.4732 | - | - | - | - | - |
341
+ | 0.6196 | 5600 | 0.4757 | - | - | - | - | - |
342
+ | 0.6417 | 5800 | 0.4704 | - | - | - | - | - |
343
+ | 0.6639 | 6000 | 0.4632 | 0.8187 (+0.2275) | 0.5322 (-0.0082) | 0.3650 (+0.0399) | 0.5871 (+0.0865) | 0.4948 (+0.0394) |
344
+ | 0.6860 | 6200 | 0.4492 | - | - | - | - | - |
345
+ | 0.7081 | 6400 | 0.4717 | - | - | - | - | - |
346
+ | 0.7303 | 6600 | 0.4639 | - | - | - | - | - |
347
+ | 0.7524 | 6800 | 0.465 | - | - | - | - | - |
348
+ | 0.7745 | 7000 | 0.4502 | 0.8261 (+0.2349) | 0.5455 (+0.0050) | 0.3540 (+0.0290) | 0.6095 (+0.1089) | 0.5030 (+0.0476) |
349
+ | 0.7966 | 7200 | 0.4582 | - | - | - | - | - |
350
+ | 0.8188 | 7400 | 0.4628 | - | - | - | - | - |
351
+ | 0.8409 | 7600 | 0.4496 | - | - | - | - | - |
352
+ | 0.8630 | 7800 | 0.4571 | - | - | - | - | - |
353
+ | 0.8852 | 8000 | 0.4459 | 0.8239 (+0.2326) | 0.5236 (-0.0168) | 0.3571 (+0.0320) | 0.5826 (+0.0819) | 0.4878 (+0.0324) |
354
+ | 0.9073 | 8200 | 0.457 | - | - | - | - | - |
355
+ | 0.9294 | 8400 | 0.4481 | - | - | - | - | - |
356
+ | 0.9515 | 8600 | 0.4515 | - | - | - | - | - |
357
+ | 0.9737 | 8800 | 0.4453 | - | - | - | - | - |
358
+ | **0.9958** | **9000** | **0.4566** | **0.8269 (+0.2356)** | **0.5287 (-0.0117)** | **0.3612 (+0.0361)** | **0.5823 (+0.0817)** | **0.4907 (+0.0354)** |
359
+ | -1 | -1 | - | 0.8269 (+0.2356) | 0.5287 (-0.0117) | 0.3612 (+0.0361) | 0.5823 (+0.0817) | 0.4907 (+0.0354) |
360
+
361
+ * The bold row denotes the saved checkpoint.
362
+
363
+ ### Framework Versions
364
+ - Python: 3.11.10
365
+ - Sentence Transformers: 3.5.0.dev0
366
+ - Transformers: 4.49.0.dev0
367
+ - PyTorch: 2.6.0.dev20241112+cu121
368
+ - Accelerate: 1.2.0
369
+ - Datasets: 3.2.0
370
+ - Tokenizers: 0.21.0
371
+
372
+ ## Citation
373
+
374
+ ### BibTeX
375
+
376
+ #### Sentence Transformers
377
+ ```bibtex
378
+ @inproceedings{reimers-2019-sentence-bert,
379
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
380
+ author = "Reimers, Nils and Gurevych, Iryna",
381
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
382
+ month = "11",
383
+ year = "2019",
384
+ publisher = "Association for Computational Linguistics",
385
+ url = "https://arxiv.org/abs/1908.10084",
386
+ }
387
+ ```
388
+
389
+ <!--
390
+ ## Glossary
391
+
392
+ *Clearly define terms in order to be accessible across audiences.*
393
+ -->
394
+
395
+ <!--
396
+ ## Model Card Authors
397
+
398
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
399
+ -->
400
+
401
+ <!--
402
+ ## Model Card Contact
403
+
404
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
405
  -->