tomaarsen HF staff commited on
Commit
41dadcc
·
verified ·
1 Parent(s): 66c5389

Add new CrossEncoder model

Browse files
README.md ADDED
@@ -0,0 +1,461 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ tags:
5
+ - sentence-transformers
6
+ - cross-encoder
7
+ - text-classification
8
+ - generated_from_trainer
9
+ - dataset_size:82326
10
+ - loss:ListNetLoss
11
+ base_model: microsoft/MiniLM-L12-H384-uncased
12
+ datasets:
13
+ - microsoft/ms_marco
14
+ pipeline_tag: text-classification
15
+ library_name: sentence-transformers
16
+ metrics:
17
+ - map
18
+ - mrr@10
19
+ - ndcg@10
20
+ co2_eq_emissions:
21
+ emissions: 91.67425151971155
22
+ energy_consumed: 0.23584713101479168
23
+ source: codecarbon
24
+ training_type: fine-tuning
25
+ on_cloud: false
26
+ cpu_model: 13th Gen Intel(R) Core(TM) i7-13700K
27
+ ram_total_size: 31.777088165283203
28
+ hours_used: 0.862
29
+ hardware_used: 1 x NVIDIA GeForce RTX 3090
30
+ model-index:
31
+ - name: CrossEncoder based on microsoft/MiniLM-L12-H384-uncased
32
+ results: []
33
+ ---
34
+
35
+ # CrossEncoder based on microsoft/MiniLM-L12-H384-uncased
36
+
37
+ This is a [Cross Encoder](https://www.sbert.net/docs/cross_encoder/usage/usage.html) model finetuned from [microsoft/MiniLM-L12-H384-uncased](https://huggingface.co/microsoft/MiniLM-L12-H384-uncased) on the [ms_marco](https://huggingface.co/datasets/microsoft/ms_marco) dataset using the [sentence-transformers](https://www.SBERT.net) library. It computes scores for pairs of texts, which can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
38
+
39
+ ## Model Details
40
+
41
+ ### Model Description
42
+ - **Model Type:** Cross Encoder
43
+ - **Base model:** [microsoft/MiniLM-L12-H384-uncased](https://huggingface.co/microsoft/MiniLM-L12-H384-uncased) <!-- at revision 44acabbec0ef496f6dbc93adadea57f376b7c0ec -->
44
+ - **Maximum Sequence Length:** 512 tokens
45
+ - **Number of Output Labels:** 1 label
46
+ - **Training Dataset:**
47
+ - [ms_marco](https://huggingface.co/datasets/microsoft/ms_marco)
48
+ - **Language:** en
49
+ <!-- - **License:** Unknown -->
50
+
51
+ ### Model Sources
52
+
53
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
54
+ - **Documentation:** [Cross Encoder Documentation](https://www.sbert.net/docs/cross_encoder/usage/usage.html)
55
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
56
+ - **Hugging Face:** [Cross Encoders on Hugging Face](https://huggingface.co/models?library=sentence-transformers&other=cross-encoder)
57
+
58
+ ## Usage
59
+
60
+ ### Direct Usage (Sentence Transformers)
61
+
62
+ First install the Sentence Transformers library:
63
+
64
+ ```bash
65
+ pip install -U sentence-transformers
66
+ ```
67
+
68
+ Then you can load this model and run inference.
69
+ ```python
70
+ from sentence_transformers import CrossEncoder
71
+
72
+ # Download from the 🤗 Hub
73
+ model = CrossEncoder("tomaarsen/reranker-msmarco-v1.1-MiniLM-L12-H384-uncased-listnet")
74
+ # Get scores for pairs of texts
75
+ pairs = [
76
+ ['How many calories in an egg', 'There are on average between 55 and 80 calories in an egg depending on its size.'],
77
+ ['How many calories in an egg', 'Egg whites are very low in calories, have no fat, no cholesterol, and are loaded with protein.'],
78
+ ['How many calories in an egg', 'Most of the calories in an egg come from the yellow yolk in the center.'],
79
+ ]
80
+ scores = model.predict(pairs)
81
+ print(scores.shape)
82
+ # (3,)
83
+
84
+ # Or rank different texts based on similarity to a single text
85
+ ranks = model.rank(
86
+ 'How many calories in an egg',
87
+ [
88
+ 'There are on average between 55 and 80 calories in an egg depending on its size.',
89
+ 'Egg whites are very low in calories, have no fat, no cholesterol, and are loaded with protein.',
90
+ 'Most of the calories in an egg come from the yellow yolk in the center.',
91
+ ]
92
+ )
93
+ # [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]
94
+ ```
95
+
96
+ <!--
97
+ ### Direct Usage (Transformers)
98
+
99
+ <details><summary>Click to see the direct usage in Transformers</summary>
100
+
101
+ </details>
102
+ -->
103
+
104
+ <!--
105
+ ### Downstream Usage (Sentence Transformers)
106
+
107
+ You can finetune this model on your own dataset.
108
+
109
+ <details><summary>Click to expand</summary>
110
+
111
+ </details>
112
+ -->
113
+
114
+ <!--
115
+ ### Out-of-Scope Use
116
+
117
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
118
+ -->
119
+
120
+ ## Evaluation
121
+
122
+ ### Metrics
123
+
124
+ #### Cross Encoder Reranking
125
+
126
+ * Datasets: `NanoMSMARCO`, `NanoNFCorpus` and `NanoNQ`
127
+ * Evaluated with [<code>CERerankingEvaluator</code>](https://sbert.net/docs/package_reference/cross_encoder/evaluation.html#sentence_transformers.cross_encoder.evaluation.CERerankingEvaluator)
128
+
129
+ | Metric | NanoMSMARCO | NanoNFCorpus | NanoNQ |
130
+ |:------------|:---------------------|:---------------------|:---------------------|
131
+ | map | 0.5020 (+0.0124) | 0.3389 (+0.0684) | 0.5833 (+0.1626) |
132
+ | mrr@10 | 0.4884 (+0.0109) | 0.5581 (+0.0582) | 0.5848 (+0.1581) |
133
+ | **ndcg@10** | **0.5545 (+0.0141)** | **0.3595 (+0.0345)** | **0.6487 (+0.1481)** |
134
+
135
+ #### Cross Encoder Nano BEIR
136
+
137
+ * Dataset: `NanoBEIR_mean`
138
+ * Evaluated with [<code>CENanoBEIREvaluator</code>](https://sbert.net/docs/package_reference/cross_encoder/evaluation.html#sentence_transformers.cross_encoder.evaluation.CENanoBEIREvaluator)
139
+
140
+ | Metric | Value |
141
+ |:------------|:---------------------|
142
+ | map | 0.4747 (+0.0812) |
143
+ | mrr@10 | 0.5437 (+0.0757) |
144
+ | **ndcg@10** | **0.5209 (+0.0655)** |
145
+
146
+ <!--
147
+ ## Bias, Risks and Limitations
148
+
149
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
150
+ -->
151
+
152
+ <!--
153
+ ### Recommendations
154
+
155
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
156
+ -->
157
+
158
+ ## Training Details
159
+
160
+ ### Training Dataset
161
+
162
+ #### ms_marco
163
+
164
+ * Dataset: [ms_marco](https://huggingface.co/datasets/microsoft/ms_marco) at [a47ee7a](https://huggingface.co/datasets/microsoft/ms_marco/tree/a47ee7aae8d7d466ba15f9f0bfac3b3681087b3a)
165
+ * Size: 82,326 training samples
166
+ * Columns: <code>query</code>, <code>docs</code>, and <code>labels</code>
167
+ * Approximate statistics based on the first 1000 samples:
168
+ | | query | docs | labels |
169
+ |:--------|:------------------------------------------------------------------------------------------------|:------------------------------------|:------------------------------------|
170
+ | type | string | list | list |
171
+ | details | <ul><li>min: 11 characters</li><li>mean: 33.24 characters</li><li>max: 101 characters</li></ul> | <ul><li>size: 10 elements</li></ul> | <ul><li>size: 10 elements</li></ul> |
172
+ * Samples:
173
+ | query | docs | labels |
174
+ |:--------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------|
175
+ | <code>what are fiber lasers</code> | <code>['From Wikipedia, the free encyclopedia. A fiber laser or fibre laser is a laser in which the active gain medium is an optical fiber doped with rare-earth elements such as erbium, ytterbium, neodymium, dysprosium, praseodymium, and thulium. They are related to doped fiber amplifiers, which provide light amplification without lasing. Many high-power fiber lasers are based on double-clad fiber. The gain medium forms the core of the fiber, which is surrounded by two layers of cladding. The lasing mode propagates in the core, while a multimode pump beam propagates in the inner cladding layer. The outer cladding keeps this pump light confined.', 'The fiber laser is a variation on the standard solid-state laser, with the medium being a clad fiber rather than a rod, a slab, or a disk. Laser light is emitted by a dopant in the central core of the fiber, and the core structure can range from simple to fairly complex. The doped fiber has a cavity mirror on each end; in practice, these are fiber ...</code> | <code>[1, 0, 0, 0, 0, ...]</code> |
176
+ | <code>fast can boar run</code> | <code>['A wild boar can run at speeds of 30-35mph which is about 48.3-56.3km/h. As for weight, a wild boar weighs around 52-91kg which is about 115-200 pounds. Wild boars are native to Europe, Africa, and some parts of Asia. The body of a wild boar is around 0.8-2 meters long which is about 2.6-6.6 feet long.', 'Wild Turkeys can run at speeds up to 25 mph, and they can fly up to 55 mph. However, if being hunted by someone for the Thanksgiving or Christmas table-Who know how fast the … y will run or fly!', 'A wild hog can reach speeds of up to 35 mph when running at full speed. A hippo can run over 30 mph! report this answer. Updated on Wednesday, February 01 2012 at 03:09PM EST. Source: www.texasboars.com/...', "Les. Brown bears-are extremely fast, capable of running in short bursts as high as of 40 mph (64 km/h). Polar bears-have been clocked at a top speed of 35 mph (56 km/h), along a a road in Churchill, Canada. Grizzly bears-can reach top speeds of up to 30 mph (48km/h), but they can't m...</code> | <code>[1, 0, 0, 0, 0, ...]</code> |
177
+ | <code>what plant would grow in shade</code> | <code>['Hostas are among the showiest and easy-to-grow perennial plants that grow in shade. They also offer the most variety of any of the multiple shade plants. Choose from miniatures that stay only a couple of inches wide or giants that sprawl 6 feet across or more. Japanese forestgrass (Hakonechloa macra) is a wonderful grass for plants that grow in shade. It offers a lovely waterfall-like habit and variegated varieties have bight gold, yellow, or white in the foliage.', 'Lilyturf (Liriope) is an easy-to-grow favorite shade plant. Loved for its grassy foliage and spikes of blue or white flowers in late summer, as well as its resistance to deer and rabbits, lilyturf is practically a plant-it-and-forget garden resident. It grows best in Zones 5-10 and grows a foot tall. Japanese forestgrass (Hakonechloa macra) is a wonderful grass for plants that grow in shade. It offers a lovely waterfall-like habit and variegated varieties have bight gold, yellow, or white in the foliage.', "Gardening in ...</code> | <code>[1, 1, 0, 0, 0, ...]</code> |
178
+ * Loss: [<code>ListNetLoss</code>](https://sbert.net/docs/package_reference/cross_encoder/losses.html#listnetloss) with these parameters:
179
+ ```json
180
+ {
181
+ "eps": 1e-10,
182
+ "pad_value": -1
183
+ }
184
+ ```
185
+
186
+ ### Evaluation Dataset
187
+
188
+ #### ms_marco
189
+
190
+ * Dataset: [ms_marco](https://huggingface.co/datasets/microsoft/ms_marco) at [a47ee7a](https://huggingface.co/datasets/microsoft/ms_marco/tree/a47ee7aae8d7d466ba15f9f0bfac3b3681087b3a)
191
+ * Size: 82,326 evaluation samples
192
+ * Columns: <code>query</code>, <code>docs</code>, and <code>labels</code>
193
+ * Approximate statistics based on the first 1000 samples:
194
+ | | query | docs | labels |
195
+ |:--------|:----------------------------------------------------------------------------------------------|:------------------------------------|:------------------------------------|
196
+ | type | string | list | list |
197
+ | details | <ul><li>min: 11 characters</li><li>mean: 33.6 characters</li><li>max: 97 characters</li></ul> | <ul><li>size: 10 elements</li></ul> | <ul><li>size: 10 elements</li></ul> |
198
+ * Samples:
199
+ | query | docs | labels |
200
+ |:----------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------|
201
+ | <code>can blue cheese cause mold allergic reaction</code> | <code>['Mold Allergy. The blue spots found in blue cheese are mold. If you’ve been diagnosed with a mold allergy, eating blue cheese can trigger common mold allergic reaction symptoms. Mold allergies commonly arise from airborne spores during the spring, summer and fall months. Inhaled mold spores cause inflammation in the eyes, throat and sinuses. If eating blue cheese causes inflammation to develop anywhere in your body, make an appointment with your doctor because you may have an allergy to one or more of its ingredients. Blue cheese contains two highly allergenic substances: milk and mold. Most symptoms caused by an allergic reaction are the result of inflammation in soft tissue in different parts of the body. Your doctor may recommend allergy testing to determine the cause of the inflammation', 'Blue cheese allergy is a condition that has puzzled food experts quite a bit. The unique gourmet cheese with a mottled appearance can cause your body to swell up making you feel extremely uncomf...</code> | <code>[1, 0, 0, 0, 0, ...]</code> |
202
+ | <code>what does it cost for a facebook ad</code> | <code>['Contributed by Jason Alleger. The cost of Facebook ads depends on a few factors, but generally ranges from $.05 – $5 per click. Facebook increases the cost of ads based on (a) targeting, (b) bids and (c) engagement. The more targeted your ads are, the more expensive they become. If you were to target ads to all Facebook users (all 1.06 billion), then you would pay just pennies. Sponsored Stories: 400 clicks to Facebook page – $200 ($.50 per click). Promoted Posts: 20,000 views – $100 ($5 per 1,000 views). It takes a lot of work to keep the cost-per-click down, as the advertiser needs to constantly be updating their ads to keep the cost low.', 'Can anyone who has advertised on facebook describe how much it cost you overall? Also, is there anyone who can mention if facebook advertising (and the specific type of facebook ad-social ad/etc, age group) was positive or negative for them in their ventures? Best Answer: Setting up an ad account and advertising on Facebook is easy. You can do ...</code> | <code>[1, 0, 0, 0, 0, ...]</code> |
203
+ | <code>how can ants get in dishwasher</code> | <code>["Full Answer. Ants usually find their way into a dishwasher through the dryer vents or the drain. Although most people's first reaction is to turn to pesticides to solve the problem, the chemicals contained in pesticides can be harmful for children and pets.", "No ants in the house. I've used traps on both sides of dishwasher and under the sink where the drain and supply holes are. We have put vinegar in the dishwasher drain & have let it sit there for three days and the ants still come back. They are only in side the dishwasher never on the counter ,floor, sink.", '1 Then leave them alone for a number of weeks. 2 Exterior: Sprinkle granular ant bait around ant hills, along ant trails; again, anywhere they appear. 3 Pets will not be injured by these baits. 4 The ants quickly take the bait below ground to the queen, destroying the colony.', "A: Empty the dishwasher completely, and pour 1 gallon of vinegar down the dishwasher's drain. Leave this for a few minutes so any ants appearin...</code> | <code>[1, 0, 0, 0, 0, ...]</code> |
204
+ * Loss: [<code>ListNetLoss</code>](https://sbert.net/docs/package_reference/cross_encoder/losses.html#listnetloss) with these parameters:
205
+ ```json
206
+ {
207
+ "eps": 1e-10,
208
+ "pad_value": -1
209
+ }
210
+ ```
211
+
212
+ ### Training Hyperparameters
213
+ #### Non-Default Hyperparameters
214
+
215
+ - `eval_strategy`: steps
216
+ - `learning_rate`: 2e-05
217
+ - `num_train_epochs`: 1
218
+ - `warmup_ratio`: 0.1
219
+ - `seed`: 12
220
+ - `bf16`: True
221
+ - `load_best_model_at_end`: True
222
+
223
+ #### All Hyperparameters
224
+ <details><summary>Click to expand</summary>
225
+
226
+ - `overwrite_output_dir`: False
227
+ - `do_predict`: False
228
+ - `eval_strategy`: steps
229
+ - `prediction_loss_only`: True
230
+ - `per_device_train_batch_size`: 8
231
+ - `per_device_eval_batch_size`: 8
232
+ - `per_gpu_train_batch_size`: None
233
+ - `per_gpu_eval_batch_size`: None
234
+ - `gradient_accumulation_steps`: 1
235
+ - `eval_accumulation_steps`: None
236
+ - `torch_empty_cache_steps`: None
237
+ - `learning_rate`: 2e-05
238
+ - `weight_decay`: 0.0
239
+ - `adam_beta1`: 0.9
240
+ - `adam_beta2`: 0.999
241
+ - `adam_epsilon`: 1e-08
242
+ - `max_grad_norm`: 1.0
243
+ - `num_train_epochs`: 1
244
+ - `max_steps`: -1
245
+ - `lr_scheduler_type`: linear
246
+ - `lr_scheduler_kwargs`: {}
247
+ - `warmup_ratio`: 0.1
248
+ - `warmup_steps`: 0
249
+ - `log_level`: passive
250
+ - `log_level_replica`: warning
251
+ - `log_on_each_node`: True
252
+ - `logging_nan_inf_filter`: True
253
+ - `save_safetensors`: True
254
+ - `save_on_each_node`: False
255
+ - `save_only_model`: False
256
+ - `restore_callback_states_from_checkpoint`: False
257
+ - `no_cuda`: False
258
+ - `use_cpu`: False
259
+ - `use_mps_device`: False
260
+ - `seed`: 12
261
+ - `data_seed`: None
262
+ - `jit_mode_eval`: False
263
+ - `use_ipex`: False
264
+ - `bf16`: True
265
+ - `fp16`: False
266
+ - `fp16_opt_level`: O1
267
+ - `half_precision_backend`: auto
268
+ - `bf16_full_eval`: False
269
+ - `fp16_full_eval`: False
270
+ - `tf32`: None
271
+ - `local_rank`: 0
272
+ - `ddp_backend`: None
273
+ - `tpu_num_cores`: None
274
+ - `tpu_metrics_debug`: False
275
+ - `debug`: []
276
+ - `dataloader_drop_last`: False
277
+ - `dataloader_num_workers`: 0
278
+ - `dataloader_prefetch_factor`: None
279
+ - `past_index`: -1
280
+ - `disable_tqdm`: False
281
+ - `remove_unused_columns`: True
282
+ - `label_names`: None
283
+ - `load_best_model_at_end`: True
284
+ - `ignore_data_skip`: False
285
+ - `fsdp`: []
286
+ - `fsdp_min_num_params`: 0
287
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
288
+ - `fsdp_transformer_layer_cls_to_wrap`: None
289
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
290
+ - `deepspeed`: None
291
+ - `label_smoothing_factor`: 0.0
292
+ - `optim`: adamw_torch
293
+ - `optim_args`: None
294
+ - `adafactor`: False
295
+ - `group_by_length`: False
296
+ - `length_column_name`: length
297
+ - `ddp_find_unused_parameters`: None
298
+ - `ddp_bucket_cap_mb`: None
299
+ - `ddp_broadcast_buffers`: False
300
+ - `dataloader_pin_memory`: True
301
+ - `dataloader_persistent_workers`: False
302
+ - `skip_memory_metrics`: True
303
+ - `use_legacy_prediction_loop`: False
304
+ - `push_to_hub`: False
305
+ - `resume_from_checkpoint`: None
306
+ - `hub_model_id`: None
307
+ - `hub_strategy`: every_save
308
+ - `hub_private_repo`: None
309
+ - `hub_always_push`: False
310
+ - `gradient_checkpointing`: False
311
+ - `gradient_checkpointing_kwargs`: None
312
+ - `include_inputs_for_metrics`: False
313
+ - `include_for_metrics`: []
314
+ - `eval_do_concat_batches`: True
315
+ - `fp16_backend`: auto
316
+ - `push_to_hub_model_id`: None
317
+ - `push_to_hub_organization`: None
318
+ - `mp_parameters`:
319
+ - `auto_find_batch_size`: False
320
+ - `full_determinism`: False
321
+ - `torchdynamo`: None
322
+ - `ray_scope`: last
323
+ - `ddp_timeout`: 1800
324
+ - `torch_compile`: False
325
+ - `torch_compile_backend`: None
326
+ - `torch_compile_mode`: None
327
+ - `dispatch_batches`: None
328
+ - `split_batches`: None
329
+ - `include_tokens_per_second`: False
330
+ - `include_num_input_tokens_seen`: False
331
+ - `neftune_noise_alpha`: None
332
+ - `optim_target_modules`: None
333
+ - `batch_eval_metrics`: False
334
+ - `eval_on_start`: False
335
+ - `use_liger_kernel`: False
336
+ - `eval_use_gather_object`: False
337
+ - `average_tokens_across_devices`: False
338
+ - `prompts`: None
339
+ - `batch_sampler`: batch_sampler
340
+ - `multi_dataset_batch_sampler`: proportional
341
+
342
+ </details>
343
+
344
+ ### Training Logs
345
+ | Epoch | Step | Training Loss | Validation Loss | NanoMSMARCO_ndcg@10 | NanoNFCorpus_ndcg@10 | NanoNQ_ndcg@10 | NanoBEIR_mean_ndcg@10 |
346
+ |:----------:|:--------:|:-------------:|:---------------:|:--------------------:|:--------------------:|:--------------------:|:---------------------:|
347
+ | -1 | -1 | - | - | 0.0444 (-0.4960) | 0.2663 (-0.0587) | 0.0478 (-0.4528) | 0.1195 (-0.3359) |
348
+ | 0.0001 | 1 | 2.0806 | - | - | - | - | - |
349
+ | 0.0230 | 200 | 2.0875 | - | - | - | - | - |
350
+ | 0.0459 | 400 | 2.097 | - | - | - | - | - |
351
+ | 0.0689 | 600 | 2.0844 | - | - | - | - | - |
352
+ | 0.0918 | 800 | 2.0771 | - | - | - | - | - |
353
+ | 0.1148 | 1000 | 2.0699 | - | - | - | - | - |
354
+ | 0.1377 | 1200 | 2.0864 | - | - | - | - | - |
355
+ | 0.1607 | 1400 | 2.0676 | - | - | - | - | - |
356
+ | 0.1836 | 1600 | 2.0772 | 2.0761 | 0.5280 (-0.0125) | 0.3529 (+0.0279) | 0.5989 (+0.0983) | 0.4933 (+0.0379) |
357
+ | 0.2066 | 1800 | 2.0822 | - | - | - | - | - |
358
+ | 0.2295 | 2000 | 2.0777 | - | - | - | - | - |
359
+ | 0.2525 | 2200 | 2.075 | - | - | - | - | - |
360
+ | 0.2755 | 2400 | 2.0717 | - | - | - | - | - |
361
+ | 0.2984 | 2600 | 2.0854 | - | - | - | - | - |
362
+ | 0.3214 | 2800 | 2.0765 | - | - | - | - | - |
363
+ | 0.3443 | 3000 | 2.0678 | - | - | - | - | - |
364
+ | 0.3673 | 3200 | 2.076 | 2.0741 | 0.5368 (-0.0037) | 0.3781 (+0.0531) | 0.5847 (+0.0841) | 0.4999 (+0.0445) |
365
+ | 0.3902 | 3400 | 2.0749 | - | - | - | - | - |
366
+ | 0.4132 | 3600 | 2.0735 | - | - | - | - | - |
367
+ | 0.4361 | 3800 | 2.0636 | - | - | - | - | - |
368
+ | 0.4591 | 4000 | 2.0749 | - | - | - | - | - |
369
+ | 0.4820 | 4200 | 2.0745 | - | - | - | - | - |
370
+ | 0.5050 | 4400 | 2.0716 | - | - | - | - | - |
371
+ | 0.5279 | 4600 | 2.0741 | - | - | - | - | - |
372
+ | 0.5509 | 4800 | 2.0724 | 2.0735 | 0.5633 (+0.0229) | 0.3703 (+0.0453) | 0.6102 (+0.1095) | 0.5146 (+0.0592) |
373
+ | 0.5739 | 5000 | 2.0788 | - | - | - | - | - |
374
+ | 0.5968 | 5200 | 2.0711 | - | - | - | - | - |
375
+ | 0.6198 | 5400 | 2.0708 | - | - | - | - | - |
376
+ | 0.6427 | 5600 | 2.0645 | - | - | - | - | - |
377
+ | 0.6657 | 5800 | 2.0684 | - | - | - | - | - |
378
+ | 0.6886 | 6000 | 2.0731 | - | - | - | - | - |
379
+ | 0.7116 | 6200 | 2.0745 | - | - | - | - | - |
380
+ | 0.7345 | 6400 | 2.067 | 2.0722 | 0.5510 (+0.0105) | 0.3441 (+0.0190) | 0.5927 (+0.0921) | 0.4959 (+0.0405) |
381
+ | 0.7575 | 6600 | 2.0657 | - | - | - | - | - |
382
+ | 0.7804 | 6800 | 2.0798 | - | - | - | - | - |
383
+ | 0.8034 | 7000 | 2.0693 | - | - | - | - | - |
384
+ | 0.8264 | 7200 | 2.074 | - | - | - | - | - |
385
+ | 0.8493 | 7400 | 2.0744 | - | - | - | - | - |
386
+ | 0.8723 | 7600 | 2.0688 | - | - | - | - | - |
387
+ | 0.8952 | 7800 | 2.0515 | - | - | - | - | - |
388
+ | **0.9182** | **8000** | **2.0765** | **2.0723** | **0.5545 (+0.0141)** | **0.3595 (+0.0345)** | **0.6487 (+0.1481)** | **0.5209 (+0.0655)** |
389
+ | 0.9411 | 8200 | 2.0777 | - | - | - | - | - |
390
+ | 0.9641 | 8400 | 2.073 | - | - | - | - | - |
391
+ | 0.9870 | 8600 | 2.0726 | - | - | - | - | - |
392
+ | -1 | -1 | - | - | 0.5545 (+0.0141) | 0.3595 (+0.0345) | 0.6487 (+0.1481) | 0.5209 (+0.0655) |
393
+
394
+ * The bold row denotes the saved checkpoint.
395
+
396
+ ### Environmental Impact
397
+ Carbon emissions were measured using [CodeCarbon](https://github.com/mlco2/codecarbon).
398
+ - **Energy Consumed**: 0.236 kWh
399
+ - **Carbon Emitted**: 0.092 kg of CO2
400
+ - **Hours Used**: 0.862 hours
401
+
402
+ ### Training Hardware
403
+ - **On Cloud**: No
404
+ - **GPU Model**: 1 x NVIDIA GeForce RTX 3090
405
+ - **CPU Model**: 13th Gen Intel(R) Core(TM) i7-13700K
406
+ - **RAM Size**: 31.78 GB
407
+
408
+ ### Framework Versions
409
+ - Python: 3.11.6
410
+ - Sentence Transformers: 3.5.0.dev0
411
+ - Transformers: 4.48.3
412
+ - PyTorch: 2.5.0+cu121
413
+ - Accelerate: 1.3.0
414
+ - Datasets: 2.20.0
415
+ - Tokenizers: 0.21.0
416
+
417
+ ## Citation
418
+
419
+ ### BibTeX
420
+
421
+ #### Sentence Transformers
422
+ ```bibtex
423
+ @inproceedings{reimers-2019-sentence-bert,
424
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
425
+ author = "Reimers, Nils and Gurevych, Iryna",
426
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
427
+ month = "11",
428
+ year = "2019",
429
+ publisher = "Association for Computational Linguistics",
430
+ url = "https://arxiv.org/abs/1908.10084",
431
+ }
432
+ ```
433
+
434
+ #### ListNetLoss
435
+ ```bibtex
436
+ @inproceedings{cao2007learning,
437
+ title={Learning to rank: from pairwise approach to listwise approach},
438
+ author={Cao, Zhe and Qin, Tao and Liu, Tie-Yan and Tsai, Ming-Feng and Li, Hang},
439
+ booktitle={Proceedings of the 24th international conference on Machine learning},
440
+ pages={129--136},
441
+ year={2007}
442
+ }
443
+ ```
444
+
445
+ <!--
446
+ ## Glossary
447
+
448
+ *Clearly define terms in order to be accessible across audiences.*
449
+ -->
450
+
451
+ <!--
452
+ ## Model Card Authors
453
+
454
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
455
+ -->
456
+
457
+ <!--
458
+ ## Model Card Contact
459
+
460
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
461
+ -->
config.json ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "microsoft/MiniLM-L12-H384-uncased",
3
+ "architectures": [
4
+ "BertForSequenceClassification"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "classifier_dropout": null,
8
+ "hidden_act": "gelu",
9
+ "hidden_dropout_prob": 0.1,
10
+ "hidden_size": 384,
11
+ "id2label": {
12
+ "0": "LABEL_0"
13
+ },
14
+ "initializer_range": 0.02,
15
+ "intermediate_size": 1536,
16
+ "label2id": {
17
+ "LABEL_0": 0
18
+ },
19
+ "layer_norm_eps": 1e-12,
20
+ "max_position_embeddings": 512,
21
+ "model_type": "bert",
22
+ "num_attention_heads": 12,
23
+ "num_hidden_layers": 12,
24
+ "pad_token_id": 0,
25
+ "position_embedding_type": "absolute",
26
+ "torch_dtype": "float32",
27
+ "transformers_version": "4.48.3",
28
+ "type_vocab_size": 2,
29
+ "use_cache": true,
30
+ "vocab_size": 30522
31
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:497305d8a570b1af2cc5e4ddd4966abe3da7fe81dae668a26a05d4ccd3d87e3b
3
+ size 133464836
special_tokens_map.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": "[CLS]",
3
+ "mask_token": "[MASK]",
4
+ "pad_token": "[PAD]",
5
+ "sep_token": "[SEP]",
6
+ "unk_token": "[UNK]"
7
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_basic_tokenize": true,
47
+ "do_lower_case": true,
48
+ "extra_special_tokens": {},
49
+ "mask_token": "[MASK]",
50
+ "model_max_length": 512,
51
+ "never_split": null,
52
+ "pad_token": "[PAD]",
53
+ "sep_token": "[SEP]",
54
+ "strip_accents": null,
55
+ "tokenize_chinese_chars": true,
56
+ "tokenizer_class": "BertTokenizer",
57
+ "unk_token": "[UNK]"
58
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff