sdiazlor HF staff commited on
Commit
ba60e8c
·
verified ·
1 Parent(s): afaaf8a

Add new SentenceTransformer model

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 768,
3
+ "pooling_mode_cls_token": false,
4
+ "pooling_mode_mean_tokens": true,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,628 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: nomic-ai/modernbert-embed-base
3
+ language:
4
+ - en
5
+ library_name: sentence-transformers
6
+ license: apache-2.0
7
+ metrics:
8
+ - cosine_accuracy
9
+ pipeline_tag: sentence-similarity
10
+ tags:
11
+ - sentence-transformers
12
+ - sentence-similarity
13
+ - feature-extraction
14
+ - generated_from_trainer
15
+ - dataset_size:662
16
+ - loss:TripletLoss
17
+ widget:
18
+ - source_sentence: 'into (ETS No. 55), which entered
19
+
20
+
21
+ into
22
+
23
+
24
+ The current state of signatures and ratifications of the Convention and its Protocols
25
+ as well as the complete list of declarations and reservations are available at
26
+ www.conventions.coe.int.
27
+
28
+
29
+ Only the English and French versions of the Convention are authentic.
30
+
31
+
32
+ European Court of Human Rights
33
+
34
+
35
+ Council of Europe
36
+
37
+
38
+ 67075 Strasbourg cedex
39
+
40
+
41
+ France
42
+
43
+
44
+ www.echr.coe.int
45
+
46
+
47
+ Contents'
48
+ sentences:
49
+ - Can you provide the current state of signatures and ratifications of the Convention
50
+ and its Protocols as well as the complete list of declarations and reservations
51
+ which are available at www.conventions.coe.int?
52
+ - What is the binding force of a judgment in a court case?
53
+ - The current state of signatures and ratifications of the OECD and its Conventions
54
+ as well as the complete list of declarations and reservations are available at
55
+ www.oecd.org.
56
+ - source_sentence: 'understand or speak the language used in court.
57
+
58
+
59
+ ARTICLE 7
60
+
61
+
62
+ No punishment without law
63
+
64
+
65
+ 1. No one shall be held guilty of any criminal offence on account of any act or
66
+ omission which did not constitute a criminal offence under national or international
67
+ law at the time when it was committed. Nor shall a heavier penalty be imposed
68
+ than the one that was applicable at the time the criminal offence was committed.'
69
+ sentences:
70
+ - Is the entry into force provision similar to other international treaties?
71
+ - No one shall be held criminally liable for speaking a language other than that
72
+ used in court proceedings on account of any act or omission which did not constitute
73
+ a language offense under national or local dialect at the time when it was spoken.
74
+ - What does it mean to understand or speak the language used in court?
75
+ - source_sentence: '2. In respect of any member State which subsequently expresses
76
+ its consent to be bound by it, the Protocol shall enter into force on the first
77
+ day of the month following the expiration of a period of three months after the
78
+ date of the deposit of the instrument of ratification, acceptance or approval.
79
+ ARTICLE 8
80
+
81
+
82
+ Depositary functions
83
+
84
+
85
+ The Secretary General of the Council of Europe shall notify all the member States
86
+ of the Council of Europe of:
87
+
88
+
89
+ (a) any signature;'
90
+ sentences:
91
+ - Is the Civil Rights Act of 1964 a landmark legislation in the US that prohibits
92
+ employment discrimination?
93
+ - Is the Protocol's entry into force date based on the deposit of the instrument
94
+ of ratification, acceptance, or approval by each member State?
95
+ - The Secretary General of the Council of Europe shall notify all member States
96
+ of the Council of Europe of the first day of the month following a period of three
97
+ months after the deposit of the instrument of ratification, acceptance, or approval
98
+ in respect of any member State which subsequently expresses its consent to be
99
+ bound by a new treaty.
100
+ - source_sentence: 2. Any State may at any later date, by a declaration addressed
101
+ to the Secretary General of the Council of Europe, extend the application of this
102
+ Protocol to any other territory specified in the declaration. In respect of such
103
+ territory the Protocol shall enter into force on the first day of the month following
104
+ the expiration of a period of two months after the date of receipt by the Secretary
105
+ General of such declaration.
106
+ sentences:
107
+ - Can the provisions of Articles 1 to 5 of this document be regarded as additional
108
+ articles to the main agreement and apply accordingly?
109
+ - In respect of such territory, the council of Europe's secretary general shall
110
+ enter into force on the first day of the month following the expiration of a two-month
111
+ period after the date of receipt of a declaration from any state.
112
+ - Is any state allowed to extend the application of this protocol to another territory
113
+ at a later date?
114
+ - source_sentence: '**US Civil Rights Act of 1964**
115
+
116
+
117
+ The landmark legislation outlawed segregation in public facilities, employment,
118
+ and education. It protected individuals from discrimination based on race, color,
119
+ religion, sex, and national origin. Title VII prohibits employment discrimination,
120
+ Title II addressed public accommodations, and Title VI ensured equal access to
121
+ education and federal funding.
122
+
123
+
124
+ **Brown v. Board of Education (1954)**
125
+
126
+
127
+ The US Supreme Court decision declared segregation in public schools unconstitutional.
128
+ The court ruled that separate educational facilities are inherently unequal, leading
129
+ to the desegregation of schools across the US. This decision was a significant
130
+ milestone in the Civil Rights Movement.
131
+
132
+
133
+ **Canadian Charter of Rights and Freedoms**
134
+
135
+
136
+ The Canadian Charter, implemented in 1982, enshrines fundamental freedoms, including
137
+ freedom of expression and equality before the law. Section 15 ensures equal protection
138
+ and benefit of the law for all individuals, regardless of their identity.
139
+
140
+
141
+ **Mandela''s Fight against Apartheid**
142
+
143
+
144
+ Nelson Mandela played a pivotal role in the fight against apartheid in South Africa.
145
+ His release from prison in 1990 marked a turning point in the struggle for equality
146
+ and democracy. The African National Congress''s efforts led to the establishment
147
+ of a democratic government in 1994.
148
+
149
+
150
+ **UN Declaration on Human Rights**
151
+
152
+
153
+ The Universal Declaration of Human Rights, adopted in 1948, outlines fundamental
154
+ human rights and freedoms. Article 26 states that everyone has the right to education,
155
+ while Article 7 emphasizes the prohibition of discrimination. These principles
156
+ serve as a foundation for human rights globally.
157
+
158
+
159
+ **Racial Discrimination Act 1975 (Australia)**
160
+
161
+
162
+ This Australian legislation makes it unlawful to discriminate against individuals
163
+ based on their race, color, descent, or national or ethnic origin. The Act also
164
+ prohibits indirect discrimination and promotes equal opportunity.
165
+
166
+
167
+ **Civil Rights Act of 1967 (Canada)**
168
+
169
+
170
+ The Canadian Act prohibited discrimination in the provision of goods and services,
171
+ accommodation, and employment. It was a significant step towards promoting equality
172
+ and protecting the rights of marginalized groups in Canada.
173
+
174
+
175
+ **Marbury v. Madison (1803)**
176
+
177
+
178
+ In this landmark US Supreme Court case, the court established the principle of
179
+ judicial review. The decision ensured that the judiciary has the power to review
180
+ and strike down laws that are deemed unconstitutional, safeguarding individual
181
+ rights and liberties.
182
+
183
+
184
+ **Equal Protection Clause**
185
+
186
+
187
+ The 14th Amendment to the US Constitution guarantees equal protection under the
188
+ law for all citizens, regardless of their status. This clause has been instrumental
189
+ in protecting the rights of marginalized groups and ensuring equal justice for
190
+ all.
191
+
192
+
193
+ **Women''s Rights Movement**
194
+
195
+
196
+ The movement for women''s suffrage and equality gained momentum in the late 19th
197
+ and early 20th centuries. Key figures such as Elizabeth Cady Stanton and Susan
198
+ B. Anthony led the charge for women''s right to vote and equal rights in education
199
+ and employment.
200
+
201
+
202
+ **International Convention on the Elimination of All Forms of Racial Discrimination**
203
+
204
+
205
+ Adopted in 1965, this international treaty obliges states to eliminate racial
206
+ discrimination in all its forms. It promotes equality and encourages states to
207
+ take proactive measures to prevent and combat racial discrimination.
208
+
209
+
210
+ **The Unrepresented Nations and Peoples Organization (UNPO)**
211
+
212
+
213
+ This international organization advocates for the rights of unrepresented peoples
214
+ and nations. The UNPO works towards promoting equality and self-determination
215
+ for marginalized communities globally.
216
+
217
+
218
+ **US Voting Rights Act of 1965**
219
+
220
+
221
+ This legislation protected the voting rights of African Americans and other minority
222
+ groups. It eliminated literacy tests and ensured equal access to voting booths,
223
+ contributing to increased voter turnout and representation.
224
+
225
+
226
+ **Gideon v. Wainwright (1963)**
227
+
228
+
229
+ In this US Supreme Court case, the court ruled that indigent defendants have a
230
+ right to an attorney in criminal cases. The decision ensured that individuals
231
+ have access to equal justice, regardless of their financial situation.
232
+
233
+
234
+ **Women''s Right to Education**
235
+
236
+
237
+ The Convention on the Elimination of All Forms of Discrimination against Women
238
+ (CEDAW) ensures equal access to education for women. The treaty promotes women''s
239
+ rights and encourages states to eliminate all forms of discrimination against
240
+ women.'
241
+ sentences:
242
+ - What is the primary implication of the landmark legislation that outlawed racial
243
+ segregation in public facilities, employment, and education across major international
244
+ airlines and transportation systems in the US?
245
+ - What opinions does the Court give at the request of the Committee of Ministers?
246
+ - What is the significance of the landmark legislation that outlawed segregation
247
+ in public facilities, employment, and education in the US?
248
+ model-index:
249
+ - name: modernbert-embed-base-biencoder-human-rights
250
+ results:
251
+ - task:
252
+ type: triplet
253
+ name: Triplet
254
+ dataset:
255
+ name: Unknown
256
+ type: unknown
257
+ metrics:
258
+ - type: cosine_accuracy
259
+ value: 0.9819277108433735
260
+ name: Cosine Accuracy
261
+ ---
262
+
263
+ # modernbert-embed-base-biencoder-human-rights
264
+
265
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [nomic-ai/modernbert-embed-base](https://huggingface.co/nomic-ai/modernbert-embed-base). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
266
+
267
+ ## Model Details
268
+
269
+ ### Model Description
270
+ - **Model Type:** Sentence Transformer
271
+ - **Base model:** [nomic-ai/modernbert-embed-base](https://huggingface.co/nomic-ai/modernbert-embed-base) <!-- at revision 92168cbee600b1abbfc10842aba988aa69572291 -->
272
+ - **Maximum Sequence Length:** 8192 tokens
273
+ - **Output Dimensionality:** 768 dimensions
274
+ - **Similarity Function:** Cosine Similarity
275
+ <!-- - **Training Dataset:** Unknown -->
276
+ - **Language:** en
277
+ - **License:** apache-2.0
278
+
279
+ ### Model Sources
280
+
281
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
282
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
283
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
284
+
285
+ ### Full Model Architecture
286
+
287
+ ```
288
+ SentenceTransformer(
289
+ (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: ModernBertModel
290
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
291
+ (2): Normalize()
292
+ )
293
+ ```
294
+
295
+ ## Usage
296
+
297
+ ### Direct Usage (Sentence Transformers)
298
+
299
+ First install the Sentence Transformers library:
300
+
301
+ ```bash
302
+ pip install -U sentence-transformers
303
+ ```
304
+
305
+ Then you can load this model and run inference.
306
+ ```python
307
+ from sentence_transformers import SentenceTransformer
308
+
309
+ # Download from the 🤗 Hub
310
+ model = SentenceTransformer("sdiazlor/modernbert-embed-base-biencoder-human-rights")
311
+ # Run inference
312
+ sentences = [
313
+ "**US Civil Rights Act of 1964**\n\nThe landmark legislation outlawed segregation in public facilities, employment, and education. It protected individuals from discrimination based on race, color, religion, sex, and national origin. Title VII prohibits employment discrimination, Title II addressed public accommodations, and Title VI ensured equal access to education and federal funding.\n\n**Brown v. Board of Education (1954)**\n\nThe US Supreme Court decision declared segregation in public schools unconstitutional. The court ruled that separate educational facilities are inherently unequal, leading to the desegregation of schools across the US. This decision was a significant milestone in the Civil Rights Movement.\n\n**Canadian Charter of Rights and Freedoms**\n\nThe Canadian Charter, implemented in 1982, enshrines fundamental freedoms, including freedom of expression and equality before the law. Section 15 ensures equal protection and benefit of the law for all individuals, regardless of their identity.\n\n**Mandela's Fight against Apartheid**\n\nNelson Mandela played a pivotal role in the fight against apartheid in South Africa. His release from prison in 1990 marked a turning point in the struggle for equality and democracy. The African National Congress's efforts led to the establishment of a democratic government in 1994.\n\n**UN Declaration on Human Rights**\n\nThe Universal Declaration of Human Rights, adopted in 1948, outlines fundamental human rights and freedoms. Article 26 states that everyone has the right to education, while Article 7 emphasizes the prohibition of discrimination. These principles serve as a foundation for human rights globally.\n\n**Racial Discrimination Act 1975 (Australia)**\n\nThis Australian legislation makes it unlawful to discriminate against individuals based on their race, color, descent, or national or ethnic origin. The Act also prohibits indirect discrimination and promotes equal opportunity.\n\n**Civil Rights Act of 1967 (Canada)**\n\nThe Canadian Act prohibited discrimination in the provision of goods and services, accommodation, and employment. It was a significant step towards promoting equality and protecting the rights of marginalized groups in Canada.\n\n**Marbury v. Madison (1803)**\n\nIn this landmark US Supreme Court case, the court established the principle of judicial review. The decision ensured that the judiciary has the power to review and strike down laws that are deemed unconstitutional, safeguarding individual rights and liberties.\n\n**Equal Protection Clause**\n\nThe 14th Amendment to the US Constitution guarantees equal protection under the law for all citizens, regardless of their status. This clause has been instrumental in protecting the rights of marginalized groups and ensuring equal justice for all.\n\n**Women's Rights Movement**\n\nThe movement for women's suffrage and equality gained momentum in the late 19th and early 20th centuries. Key figures such as Elizabeth Cady Stanton and Susan B. Anthony led the charge for women's right to vote and equal rights in education and employment.\n\n**International Convention on the Elimination of All Forms of Racial Discrimination**\n\nAdopted in 1965, this international treaty obliges states to eliminate racial discrimination in all its forms. It promotes equality and encourages states to take proactive measures to prevent and combat racial discrimination.\n\n**The Unrepresented Nations and Peoples Organization (UNPO)**\n\nThis international organization advocates for the rights of unrepresented peoples and nations. The UNPO works towards promoting equality and self-determination for marginalized communities globally.\n\n**US Voting Rights Act of 1965**\n\nThis legislation protected the voting rights of African Americans and other minority groups. It eliminated literacy tests and ensured equal access to voting booths, contributing to increased voter turnout and representation.\n\n**Gideon v. Wainwright (1963)**\n\nIn this US Supreme Court case, the court ruled that indigent defendants have a right to an attorney in criminal cases. The decision ensured that individuals have access to equal justice, regardless of their financial situation.\n\n**Women's Right to Education**\n\nThe Convention on the Elimination of All Forms of Discrimination against Women (CEDAW) ensures equal access to education for women. The treaty promotes women's rights and encourages states to eliminate all forms of discrimination against women.",
314
+ 'What is the significance of the landmark legislation that outlawed segregation in public facilities, employment, and education in the US?',
315
+ 'What is the primary implication of the landmark legislation that outlawed racial segregation in public facilities, employment, and education across major international airlines and transportation systems in the US?',
316
+ ]
317
+ embeddings = model.encode(sentences)
318
+ print(embeddings.shape)
319
+ # [3, 768]
320
+
321
+ # Get the similarity scores for the embeddings
322
+ similarities = model.similarity(embeddings, embeddings)
323
+ print(similarities.shape)
324
+ # [3, 3]
325
+ ```
326
+
327
+ <!--
328
+ ### Direct Usage (Transformers)
329
+
330
+ <details><summary>Click to see the direct usage in Transformers</summary>
331
+
332
+ </details>
333
+ -->
334
+
335
+ <!--
336
+ ### Downstream Usage (Sentence Transformers)
337
+
338
+ You can finetune this model on your own dataset.
339
+
340
+ <details><summary>Click to expand</summary>
341
+
342
+ </details>
343
+ -->
344
+
345
+ <!--
346
+ ### Out-of-Scope Use
347
+
348
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
349
+ -->
350
+
351
+ ## Evaluation
352
+
353
+ ### Metrics
354
+
355
+ #### Triplet
356
+
357
+ * Evaluated with [<code>TripletEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.TripletEvaluator)
358
+
359
+ | Metric | Value |
360
+ |:--------------------|:-----------|
361
+ | **cosine_accuracy** | **0.9819** |
362
+
363
+ <!--
364
+ ## Bias, Risks and Limitations
365
+
366
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
367
+ -->
368
+
369
+ <!--
370
+ ### Recommendations
371
+
372
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
373
+ -->
374
+
375
+ ## Training Details
376
+
377
+ ### Training Dataset
378
+
379
+ #### Unnamed Dataset
380
+
381
+
382
+ * Size: 662 training samples
383
+ * Columns: <code>anchor</code>, <code>positive</code>, and <code>negative</code>
384
+ * Approximate statistics based on the first 662 samples:
385
+ | | anchor | positive | negative |
386
+ |:--------|:-------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|
387
+ | type | string | string | string |
388
+ | details | <ul><li>min: 8 tokens</li><li>mean: 324.21 tokens</li><li>max: 2194 tokens</li></ul> | <ul><li>min: 7 tokens</li><li>mean: 23.84 tokens</li><li>max: 79 tokens</li></ul> | <ul><li>min: 8 tokens</li><li>mean: 36.85 tokens</li><li>max: 146 tokens</li></ul> |
389
+ * Samples:
390
+ | anchor | positive | negative |
391
+ |:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
392
+ | <code>Final judgments<br><br>1. The judgment of the Grand Chamber shall be final.<br><br>2. The judgment of a Chamber shall become final<br><br>(a) when the parties declare that they will not request that the<br><br>case be referred to the Grand Chamber; or<br><br>(b) three months after the date of the judgment, if reference of the case to the Grand Chamber has not been requested; or<br><br>(c) when the panel of the Grand Chamber rejects the request<br><br>to refer under Article 43.<br><br>3. The final judgment shall be published.<br><br>25<br><br>ARTICLE 45</code> | <code>What is the final judgment in a Chamber of the Grand Chamber?</code> | <code>The judgment of the Grand Chamber shall be final for the Grand Prix.</code> |
393
+ | <code>(b) any service of a military character or, in case of conscientious objectors in countries where they are recognised, service exacted instead of compulsory military service;<br><br>(c) any service exacted in case of an emergency or calamity<br><br>threatening the life or well-being of the community;<br><br>(d) any work or service which forms part of normal civic<br><br>obligations.<br><br>7</code> | <code>Is the service of a military character or service exacted in case of an emergency or calamity considered a civic obligation?</code> | <code>Any service of a military character or service exacted in case of a natural disaster threatening the economy is considered a civic duty.</code> |
394
+ | <code>Signature and ratification<br><br>1. This Convention shall be open to the signature of the members of the Council of Europe. It shall be ratified. Ratifications shall be deposited with the Secretary General of the Council of Europe.<br><br>2. The European Union may accede to this Convention.<br><br>31<br><br>3. The present Convention shall come into force after the deposit of ten instruments of ratification.</code> | <code>What are the requirements for signature and ratification of this Convention?</code> | <code>The Secretary General of the Council of Europe shall deposit the instruments of ratification for the new international treaty on environmental protection.</code> |
395
+ * Loss: [<code>TripletLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#tripletloss) with these parameters:
396
+ ```json
397
+ {
398
+ "distance_metric": "TripletDistanceMetric.EUCLIDEAN",
399
+ "triplet_margin": 5
400
+ }
401
+ ```
402
+
403
+ ### Evaluation Dataset
404
+
405
+ #### Unnamed Dataset
406
+
407
+
408
+ * Size: 166 evaluation samples
409
+ * Columns: <code>anchor</code>, <code>positive</code>, and <code>negative</code>
410
+ * Approximate statistics based on the first 166 samples:
411
+ | | anchor | positive | negative |
412
+ |:--------|:--------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|
413
+ | type | string | string | string |
414
+ | details | <ul><li>min: 16 tokens</li><li>mean: 351.63 tokens</li><li>max: 2268 tokens</li></ul> | <ul><li>min: 10 tokens</li><li>mean: 23.37 tokens</li><li>max: 59 tokens</li></ul> | <ul><li>min: 14 tokens</li><li>mean: 36.6 tokens</li><li>max: 133 tokens</li></ul> |
415
+ * Samples:
416
+ | anchor | positive | negative |
417
+ |:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
418
+ | <code>**United States - Landmark Cases**<br><br>The landmark case of Brown v. Board of Education (1954) declared segregation in public schools unconstitutional. The ruling effectively overturned Plessy v. Ferguson (1896) and its "separate but equal" doctrine. The Civil Rights Act of 1964 prohibited discrimination in employment, public accommodations, and voting rights.<br><br>**Canada - Bill of Rights**<br><br>The Canadian Bill of Rights (1960) protects individuals from arbitrary state action, including racial and religious discrimination. It restricts the government's ability to infringe on fundamental freedoms, such as freedom of association and speech. The Canadian Human Rights Act (1977) prohibited discrimination in employment, housing, and services.<br><br>**India - Fundamental Rights**<br><br>The Indian Constitution (1950) guarantees fundamental rights, including equality, freedom of speech, and the right to life. The Scheduled Castes and Scheduled Tribes (Prevention of Atrocities) Act (1989) aims to protect vulner...</code> | <code>What are some landmark cases in the United States that declared segregation in public institutions unconstitutional?</code> | <code>What are some notable cases in the United States that declared the segregation of public institutions constitutional?</code> |
419
+ | <code>2. The Convention shall extend to the territory or territories named in the notification as from the thirtieth day after the receipt of this notification by the Secretary General of the Council of Europe.<br><br>3. The provisions of this Convention shall be applied in such territories with due regard, however, to local requirements.</code> | <code>What day does the Convention extend to the territory or territories as from the thirtieth day after the receipt of a notification by the Secretary General?</code> | <code>The Convention shall extend to the territory of a private island as from the thirtieth day after the receipt of a notification by the developer's project manager.</code> |
420
+ | <code>Advisory opinions<br><br>1. The Court may, at the request of the Committee of Ministers, give advisory opinions on legal questions concerning the interpretation of the Convention and the Protocols thereto.</code> | <code>What opinions does the Court give at the request of the Committee of Ministers?</code> | <code>The Committee of Experts may provide advisory opinions on technical questions concerning the interpretation of the Convention and the Protocols thereto.</code> |
421
+ * Loss: [<code>TripletLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#tripletloss) with these parameters:
422
+ ```json
423
+ {
424
+ "distance_metric": "TripletDistanceMetric.EUCLIDEAN",
425
+ "triplet_margin": 5
426
+ }
427
+ ```
428
+
429
+ ### Training Hyperparameters
430
+ #### Non-Default Hyperparameters
431
+
432
+ - `eval_strategy`: epoch
433
+ - `per_device_train_batch_size`: 4
434
+ - `per_device_eval_batch_size`: 4
435
+ - `gradient_accumulation_steps`: 4
436
+ - `learning_rate`: 2e-05
437
+ - `lr_scheduler_type`: cosine
438
+ - `warmup_ratio`: 0.1
439
+ - `use_mps_device`: True
440
+ - `load_best_model_at_end`: True
441
+ - `batch_sampler`: no_duplicates
442
+
443
+ #### All Hyperparameters
444
+ <details><summary>Click to expand</summary>
445
+
446
+ - `overwrite_output_dir`: False
447
+ - `do_predict`: False
448
+ - `eval_strategy`: epoch
449
+ - `prediction_loss_only`: True
450
+ - `per_device_train_batch_size`: 4
451
+ - `per_device_eval_batch_size`: 4
452
+ - `per_gpu_train_batch_size`: None
453
+ - `per_gpu_eval_batch_size`: None
454
+ - `gradient_accumulation_steps`: 4
455
+ - `eval_accumulation_steps`: None
456
+ - `torch_empty_cache_steps`: None
457
+ - `learning_rate`: 2e-05
458
+ - `weight_decay`: 0.0
459
+ - `adam_beta1`: 0.9
460
+ - `adam_beta2`: 0.999
461
+ - `adam_epsilon`: 1e-08
462
+ - `max_grad_norm`: 1.0
463
+ - `num_train_epochs`: 3
464
+ - `max_steps`: -1
465
+ - `lr_scheduler_type`: cosine
466
+ - `lr_scheduler_kwargs`: {}
467
+ - `warmup_ratio`: 0.1
468
+ - `warmup_steps`: 0
469
+ - `log_level`: passive
470
+ - `log_level_replica`: warning
471
+ - `log_on_each_node`: True
472
+ - `logging_nan_inf_filter`: True
473
+ - `save_safetensors`: True
474
+ - `save_on_each_node`: False
475
+ - `save_only_model`: False
476
+ - `restore_callback_states_from_checkpoint`: False
477
+ - `no_cuda`: False
478
+ - `use_cpu`: False
479
+ - `use_mps_device`: True
480
+ - `seed`: 42
481
+ - `data_seed`: None
482
+ - `jit_mode_eval`: False
483
+ - `use_ipex`: False
484
+ - `bf16`: False
485
+ - `fp16`: False
486
+ - `fp16_opt_level`: O1
487
+ - `half_precision_backend`: auto
488
+ - `bf16_full_eval`: False
489
+ - `fp16_full_eval`: False
490
+ - `tf32`: None
491
+ - `local_rank`: 0
492
+ - `ddp_backend`: None
493
+ - `tpu_num_cores`: None
494
+ - `tpu_metrics_debug`: False
495
+ - `debug`: []
496
+ - `dataloader_drop_last`: False
497
+ - `dataloader_num_workers`: 0
498
+ - `dataloader_prefetch_factor`: None
499
+ - `past_index`: -1
500
+ - `disable_tqdm`: False
501
+ - `remove_unused_columns`: True
502
+ - `label_names`: None
503
+ - `load_best_model_at_end`: True
504
+ - `ignore_data_skip`: False
505
+ - `fsdp`: []
506
+ - `fsdp_min_num_params`: 0
507
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
508
+ - `fsdp_transformer_layer_cls_to_wrap`: None
509
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
510
+ - `deepspeed`: None
511
+ - `label_smoothing_factor`: 0.0
512
+ - `optim`: adamw_torch
513
+ - `optim_args`: None
514
+ - `adafactor`: False
515
+ - `group_by_length`: False
516
+ - `length_column_name`: length
517
+ - `ddp_find_unused_parameters`: None
518
+ - `ddp_bucket_cap_mb`: None
519
+ - `ddp_broadcast_buffers`: False
520
+ - `dataloader_pin_memory`: True
521
+ - `dataloader_persistent_workers`: False
522
+ - `skip_memory_metrics`: True
523
+ - `use_legacy_prediction_loop`: False
524
+ - `push_to_hub`: False
525
+ - `resume_from_checkpoint`: None
526
+ - `hub_model_id`: None
527
+ - `hub_strategy`: every_save
528
+ - `hub_private_repo`: None
529
+ - `hub_always_push`: False
530
+ - `gradient_checkpointing`: False
531
+ - `gradient_checkpointing_kwargs`: None
532
+ - `include_inputs_for_metrics`: False
533
+ - `include_for_metrics`: []
534
+ - `eval_do_concat_batches`: True
535
+ - `fp16_backend`: auto
536
+ - `push_to_hub_model_id`: None
537
+ - `push_to_hub_organization`: None
538
+ - `mp_parameters`:
539
+ - `auto_find_batch_size`: False
540
+ - `full_determinism`: False
541
+ - `torchdynamo`: None
542
+ - `ray_scope`: last
543
+ - `ddp_timeout`: 1800
544
+ - `torch_compile`: False
545
+ - `torch_compile_backend`: None
546
+ - `torch_compile_mode`: None
547
+ - `dispatch_batches`: None
548
+ - `split_batches`: None
549
+ - `include_tokens_per_second`: False
550
+ - `include_num_input_tokens_seen`: False
551
+ - `neftune_noise_alpha`: None
552
+ - `optim_target_modules`: None
553
+ - `batch_eval_metrics`: False
554
+ - `eval_on_start`: False
555
+ - `use_liger_kernel`: False
556
+ - `eval_use_gather_object`: False
557
+ - `average_tokens_across_devices`: False
558
+ - `prompts`: None
559
+ - `batch_sampler`: no_duplicates
560
+ - `multi_dataset_batch_sampler`: proportional
561
+
562
+ </details>
563
+
564
+ ### Training Logs
565
+ | Epoch | Step | Training Loss | Validation Loss | cosine_accuracy |
566
+ |:----------:|:-------:|:-------------:|:---------------:|:---------------:|
567
+ | 1.0 | 42 | - | 3.6559 | 0.9699 |
568
+ | 2.0 | 84 | - | 3.5678 | 0.9880 |
569
+ | 2.3855 | 100 | 14.374 | - | - |
570
+ | **2.9398** | **123** | **-** | **3.4984** | **0.9819** |
571
+
572
+ * The bold row denotes the saved checkpoint.
573
+
574
+ ### Framework Versions
575
+ - Python: 3.11.4
576
+ - Sentence Transformers: 3.3.1
577
+ - Transformers: 4.49.0.dev0
578
+ - PyTorch: 2.4.0
579
+ - Accelerate: 0.34.0
580
+ - Datasets: 2.21.0
581
+ - Tokenizers: 0.21.0
582
+
583
+ ## Citation
584
+
585
+ ### BibTeX
586
+
587
+ #### Sentence Transformers
588
+ ```bibtex
589
+ @inproceedings{reimers-2019-sentence-bert,
590
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
591
+ author = "Reimers, Nils and Gurevych, Iryna",
592
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
593
+ month = "11",
594
+ year = "2019",
595
+ publisher = "Association for Computational Linguistics",
596
+ url = "https://arxiv.org/abs/1908.10084",
597
+ }
598
+ ```
599
+
600
+ #### TripletLoss
601
+ ```bibtex
602
+ @misc{hermans2017defense,
603
+ title={In Defense of the Triplet Loss for Person Re-Identification},
604
+ author={Alexander Hermans and Lucas Beyer and Bastian Leibe},
605
+ year={2017},
606
+ eprint={1703.07737},
607
+ archivePrefix={arXiv},
608
+ primaryClass={cs.CV}
609
+ }
610
+ ```
611
+
612
+ <!--
613
+ ## Glossary
614
+
615
+ *Clearly define terms in order to be accessible across audiences.*
616
+ -->
617
+
618
+ <!--
619
+ ## Model Card Authors
620
+
621
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
622
+ -->
623
+
624
+ <!--
625
+ ## Model Card Contact
626
+
627
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
628
+ -->
config.json ADDED
@@ -0,0 +1,47 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "nomic-ai/modernbert-embed-base",
3
+ "architectures": [
4
+ "ModernBertModel"
5
+ ],
6
+ "attention_bias": false,
7
+ "attention_dropout": 0.0,
8
+ "bos_token_id": 50281,
9
+ "classifier_activation": "gelu",
10
+ "classifier_bias": false,
11
+ "classifier_dropout": 0.0,
12
+ "classifier_pooling": "mean",
13
+ "cls_token_id": 50281,
14
+ "decoder_bias": true,
15
+ "deterministic_flash_attn": false,
16
+ "embedding_dropout": 0.0,
17
+ "eos_token_id": 50282,
18
+ "global_attn_every_n_layers": 3,
19
+ "global_rope_theta": 160000.0,
20
+ "gradient_checkpointing": false,
21
+ "hidden_activation": "gelu",
22
+ "hidden_size": 768,
23
+ "initializer_cutoff_factor": 2.0,
24
+ "initializer_range": 0.02,
25
+ "intermediate_size": 1152,
26
+ "layer_norm_eps": 1e-05,
27
+ "local_attention": 128,
28
+ "local_rope_theta": 10000.0,
29
+ "max_position_embeddings": 8192,
30
+ "mlp_bias": false,
31
+ "mlp_dropout": 0.0,
32
+ "model_type": "modernbert",
33
+ "norm_bias": false,
34
+ "norm_eps": 1e-05,
35
+ "num_attention_heads": 12,
36
+ "num_hidden_layers": 22,
37
+ "pad_token_id": 50283,
38
+ "position_embedding_type": "absolute",
39
+ "reference_compile": false,
40
+ "repad_logits_with_grad": false,
41
+ "sep_token_id": 50282,
42
+ "sparse_pred_ignore_index": -100,
43
+ "sparse_prediction": false,
44
+ "torch_dtype": "float32",
45
+ "transformers_version": "4.49.0.dev0",
46
+ "vocab_size": 50368
47
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.3.1",
4
+ "transformers": "4.49.0.dev0",
5
+ "pytorch": "2.4.0"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": "cosine"
10
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fd3f350ac15fda24c43118cbe086e9ef4c29222e2dd1a1b31074eae82b2b2639
3
+ size 596070136
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 8192,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": true,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,945 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "|||IP_ADDRESS|||",
5
+ "lstrip": false,
6
+ "normalized": true,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": false
10
+ },
11
+ "1": {
12
+ "content": "<|padding|>",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "50254": {
20
+ "content": " ",
21
+ "lstrip": false,
22
+ "normalized": true,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": false
26
+ },
27
+ "50255": {
28
+ "content": " ",
29
+ "lstrip": false,
30
+ "normalized": true,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": false
34
+ },
35
+ "50256": {
36
+ "content": " ",
37
+ "lstrip": false,
38
+ "normalized": true,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": false
42
+ },
43
+ "50257": {
44
+ "content": " ",
45
+ "lstrip": false,
46
+ "normalized": true,
47
+ "rstrip": false,
48
+ "single_word": false,
49
+ "special": false
50
+ },
51
+ "50258": {
52
+ "content": " ",
53
+ "lstrip": false,
54
+ "normalized": true,
55
+ "rstrip": false,
56
+ "single_word": false,
57
+ "special": false
58
+ },
59
+ "50259": {
60
+ "content": " ",
61
+ "lstrip": false,
62
+ "normalized": true,
63
+ "rstrip": false,
64
+ "single_word": false,
65
+ "special": false
66
+ },
67
+ "50260": {
68
+ "content": " ",
69
+ "lstrip": false,
70
+ "normalized": true,
71
+ "rstrip": false,
72
+ "single_word": false,
73
+ "special": false
74
+ },
75
+ "50261": {
76
+ "content": " ",
77
+ "lstrip": false,
78
+ "normalized": true,
79
+ "rstrip": false,
80
+ "single_word": false,
81
+ "special": false
82
+ },
83
+ "50262": {
84
+ "content": " ",
85
+ "lstrip": false,
86
+ "normalized": true,
87
+ "rstrip": false,
88
+ "single_word": false,
89
+ "special": false
90
+ },
91
+ "50263": {
92
+ "content": " ",
93
+ "lstrip": false,
94
+ "normalized": true,
95
+ "rstrip": false,
96
+ "single_word": false,
97
+ "special": false
98
+ },
99
+ "50264": {
100
+ "content": " ",
101
+ "lstrip": false,
102
+ "normalized": true,
103
+ "rstrip": false,
104
+ "single_word": false,
105
+ "special": false
106
+ },
107
+ "50265": {
108
+ "content": " ",
109
+ "lstrip": false,
110
+ "normalized": true,
111
+ "rstrip": false,
112
+ "single_word": false,
113
+ "special": false
114
+ },
115
+ "50266": {
116
+ "content": " ",
117
+ "lstrip": false,
118
+ "normalized": true,
119
+ "rstrip": false,
120
+ "single_word": false,
121
+ "special": false
122
+ },
123
+ "50267": {
124
+ "content": " ",
125
+ "lstrip": false,
126
+ "normalized": true,
127
+ "rstrip": false,
128
+ "single_word": false,
129
+ "special": false
130
+ },
131
+ "50268": {
132
+ "content": " ",
133
+ "lstrip": false,
134
+ "normalized": true,
135
+ "rstrip": false,
136
+ "single_word": false,
137
+ "special": false
138
+ },
139
+ "50269": {
140
+ "content": " ",
141
+ "lstrip": false,
142
+ "normalized": true,
143
+ "rstrip": false,
144
+ "single_word": false,
145
+ "special": false
146
+ },
147
+ "50270": {
148
+ "content": " ",
149
+ "lstrip": false,
150
+ "normalized": true,
151
+ "rstrip": false,
152
+ "single_word": false,
153
+ "special": false
154
+ },
155
+ "50271": {
156
+ "content": " ",
157
+ "lstrip": false,
158
+ "normalized": true,
159
+ "rstrip": false,
160
+ "single_word": false,
161
+ "special": false
162
+ },
163
+ "50272": {
164
+ "content": " ",
165
+ "lstrip": false,
166
+ "normalized": true,
167
+ "rstrip": false,
168
+ "single_word": false,
169
+ "special": false
170
+ },
171
+ "50273": {
172
+ "content": " ",
173
+ "lstrip": false,
174
+ "normalized": true,
175
+ "rstrip": false,
176
+ "single_word": false,
177
+ "special": false
178
+ },
179
+ "50274": {
180
+ "content": " ",
181
+ "lstrip": false,
182
+ "normalized": true,
183
+ "rstrip": false,
184
+ "single_word": false,
185
+ "special": false
186
+ },
187
+ "50275": {
188
+ "content": " ",
189
+ "lstrip": false,
190
+ "normalized": true,
191
+ "rstrip": false,
192
+ "single_word": false,
193
+ "special": false
194
+ },
195
+ "50276": {
196
+ "content": " ",
197
+ "lstrip": false,
198
+ "normalized": true,
199
+ "rstrip": false,
200
+ "single_word": false,
201
+ "special": false
202
+ },
203
+ "50277": {
204
+ "content": "|||EMAIL_ADDRESS|||",
205
+ "lstrip": false,
206
+ "normalized": true,
207
+ "rstrip": false,
208
+ "single_word": false,
209
+ "special": false
210
+ },
211
+ "50278": {
212
+ "content": "|||PHONE_NUMBER|||",
213
+ "lstrip": false,
214
+ "normalized": true,
215
+ "rstrip": false,
216
+ "single_word": false,
217
+ "special": false
218
+ },
219
+ "50279": {
220
+ "content": "<|endoftext|>",
221
+ "lstrip": false,
222
+ "normalized": false,
223
+ "rstrip": false,
224
+ "single_word": false,
225
+ "special": true
226
+ },
227
+ "50280": {
228
+ "content": "[UNK]",
229
+ "lstrip": false,
230
+ "normalized": false,
231
+ "rstrip": false,
232
+ "single_word": false,
233
+ "special": true
234
+ },
235
+ "50281": {
236
+ "content": "[CLS]",
237
+ "lstrip": false,
238
+ "normalized": false,
239
+ "rstrip": false,
240
+ "single_word": false,
241
+ "special": true
242
+ },
243
+ "50282": {
244
+ "content": "[SEP]",
245
+ "lstrip": false,
246
+ "normalized": false,
247
+ "rstrip": false,
248
+ "single_word": false,
249
+ "special": true
250
+ },
251
+ "50283": {
252
+ "content": "[PAD]",
253
+ "lstrip": false,
254
+ "normalized": false,
255
+ "rstrip": false,
256
+ "single_word": false,
257
+ "special": true
258
+ },
259
+ "50284": {
260
+ "content": "[MASK]",
261
+ "lstrip": true,
262
+ "normalized": false,
263
+ "rstrip": false,
264
+ "single_word": false,
265
+ "special": true
266
+ },
267
+ "50285": {
268
+ "content": "[unused0]",
269
+ "lstrip": false,
270
+ "normalized": true,
271
+ "rstrip": false,
272
+ "single_word": false,
273
+ "special": false
274
+ },
275
+ "50286": {
276
+ "content": "[unused1]",
277
+ "lstrip": false,
278
+ "normalized": true,
279
+ "rstrip": false,
280
+ "single_word": false,
281
+ "special": false
282
+ },
283
+ "50287": {
284
+ "content": "[unused2]",
285
+ "lstrip": false,
286
+ "normalized": true,
287
+ "rstrip": false,
288
+ "single_word": false,
289
+ "special": false
290
+ },
291
+ "50288": {
292
+ "content": "[unused3]",
293
+ "lstrip": false,
294
+ "normalized": true,
295
+ "rstrip": false,
296
+ "single_word": false,
297
+ "special": false
298
+ },
299
+ "50289": {
300
+ "content": "[unused4]",
301
+ "lstrip": false,
302
+ "normalized": true,
303
+ "rstrip": false,
304
+ "single_word": false,
305
+ "special": false
306
+ },
307
+ "50290": {
308
+ "content": "[unused5]",
309
+ "lstrip": false,
310
+ "normalized": true,
311
+ "rstrip": false,
312
+ "single_word": false,
313
+ "special": false
314
+ },
315
+ "50291": {
316
+ "content": "[unused6]",
317
+ "lstrip": false,
318
+ "normalized": true,
319
+ "rstrip": false,
320
+ "single_word": false,
321
+ "special": false
322
+ },
323
+ "50292": {
324
+ "content": "[unused7]",
325
+ "lstrip": false,
326
+ "normalized": true,
327
+ "rstrip": false,
328
+ "single_word": false,
329
+ "special": false
330
+ },
331
+ "50293": {
332
+ "content": "[unused8]",
333
+ "lstrip": false,
334
+ "normalized": true,
335
+ "rstrip": false,
336
+ "single_word": false,
337
+ "special": false
338
+ },
339
+ "50294": {
340
+ "content": "[unused9]",
341
+ "lstrip": false,
342
+ "normalized": true,
343
+ "rstrip": false,
344
+ "single_word": false,
345
+ "special": false
346
+ },
347
+ "50295": {
348
+ "content": "[unused10]",
349
+ "lstrip": false,
350
+ "normalized": true,
351
+ "rstrip": false,
352
+ "single_word": false,
353
+ "special": false
354
+ },
355
+ "50296": {
356
+ "content": "[unused11]",
357
+ "lstrip": false,
358
+ "normalized": true,
359
+ "rstrip": false,
360
+ "single_word": false,
361
+ "special": false
362
+ },
363
+ "50297": {
364
+ "content": "[unused12]",
365
+ "lstrip": false,
366
+ "normalized": true,
367
+ "rstrip": false,
368
+ "single_word": false,
369
+ "special": false
370
+ },
371
+ "50298": {
372
+ "content": "[unused13]",
373
+ "lstrip": false,
374
+ "normalized": true,
375
+ "rstrip": false,
376
+ "single_word": false,
377
+ "special": false
378
+ },
379
+ "50299": {
380
+ "content": "[unused14]",
381
+ "lstrip": false,
382
+ "normalized": true,
383
+ "rstrip": false,
384
+ "single_word": false,
385
+ "special": false
386
+ },
387
+ "50300": {
388
+ "content": "[unused15]",
389
+ "lstrip": false,
390
+ "normalized": true,
391
+ "rstrip": false,
392
+ "single_word": false,
393
+ "special": false
394
+ },
395
+ "50301": {
396
+ "content": "[unused16]",
397
+ "lstrip": false,
398
+ "normalized": true,
399
+ "rstrip": false,
400
+ "single_word": false,
401
+ "special": false
402
+ },
403
+ "50302": {
404
+ "content": "[unused17]",
405
+ "lstrip": false,
406
+ "normalized": true,
407
+ "rstrip": false,
408
+ "single_word": false,
409
+ "special": false
410
+ },
411
+ "50303": {
412
+ "content": "[unused18]",
413
+ "lstrip": false,
414
+ "normalized": true,
415
+ "rstrip": false,
416
+ "single_word": false,
417
+ "special": false
418
+ },
419
+ "50304": {
420
+ "content": "[unused19]",
421
+ "lstrip": false,
422
+ "normalized": true,
423
+ "rstrip": false,
424
+ "single_word": false,
425
+ "special": false
426
+ },
427
+ "50305": {
428
+ "content": "[unused20]",
429
+ "lstrip": false,
430
+ "normalized": true,
431
+ "rstrip": false,
432
+ "single_word": false,
433
+ "special": false
434
+ },
435
+ "50306": {
436
+ "content": "[unused21]",
437
+ "lstrip": false,
438
+ "normalized": true,
439
+ "rstrip": false,
440
+ "single_word": false,
441
+ "special": false
442
+ },
443
+ "50307": {
444
+ "content": "[unused22]",
445
+ "lstrip": false,
446
+ "normalized": true,
447
+ "rstrip": false,
448
+ "single_word": false,
449
+ "special": false
450
+ },
451
+ "50308": {
452
+ "content": "[unused23]",
453
+ "lstrip": false,
454
+ "normalized": true,
455
+ "rstrip": false,
456
+ "single_word": false,
457
+ "special": false
458
+ },
459
+ "50309": {
460
+ "content": "[unused24]",
461
+ "lstrip": false,
462
+ "normalized": true,
463
+ "rstrip": false,
464
+ "single_word": false,
465
+ "special": false
466
+ },
467
+ "50310": {
468
+ "content": "[unused25]",
469
+ "lstrip": false,
470
+ "normalized": true,
471
+ "rstrip": false,
472
+ "single_word": false,
473
+ "special": false
474
+ },
475
+ "50311": {
476
+ "content": "[unused26]",
477
+ "lstrip": false,
478
+ "normalized": true,
479
+ "rstrip": false,
480
+ "single_word": false,
481
+ "special": false
482
+ },
483
+ "50312": {
484
+ "content": "[unused27]",
485
+ "lstrip": false,
486
+ "normalized": true,
487
+ "rstrip": false,
488
+ "single_word": false,
489
+ "special": false
490
+ },
491
+ "50313": {
492
+ "content": "[unused28]",
493
+ "lstrip": false,
494
+ "normalized": true,
495
+ "rstrip": false,
496
+ "single_word": false,
497
+ "special": false
498
+ },
499
+ "50314": {
500
+ "content": "[unused29]",
501
+ "lstrip": false,
502
+ "normalized": true,
503
+ "rstrip": false,
504
+ "single_word": false,
505
+ "special": false
506
+ },
507
+ "50315": {
508
+ "content": "[unused30]",
509
+ "lstrip": false,
510
+ "normalized": true,
511
+ "rstrip": false,
512
+ "single_word": false,
513
+ "special": false
514
+ },
515
+ "50316": {
516
+ "content": "[unused31]",
517
+ "lstrip": false,
518
+ "normalized": true,
519
+ "rstrip": false,
520
+ "single_word": false,
521
+ "special": false
522
+ },
523
+ "50317": {
524
+ "content": "[unused32]",
525
+ "lstrip": false,
526
+ "normalized": true,
527
+ "rstrip": false,
528
+ "single_word": false,
529
+ "special": false
530
+ },
531
+ "50318": {
532
+ "content": "[unused33]",
533
+ "lstrip": false,
534
+ "normalized": true,
535
+ "rstrip": false,
536
+ "single_word": false,
537
+ "special": false
538
+ },
539
+ "50319": {
540
+ "content": "[unused34]",
541
+ "lstrip": false,
542
+ "normalized": true,
543
+ "rstrip": false,
544
+ "single_word": false,
545
+ "special": false
546
+ },
547
+ "50320": {
548
+ "content": "[unused35]",
549
+ "lstrip": false,
550
+ "normalized": true,
551
+ "rstrip": false,
552
+ "single_word": false,
553
+ "special": false
554
+ },
555
+ "50321": {
556
+ "content": "[unused36]",
557
+ "lstrip": false,
558
+ "normalized": true,
559
+ "rstrip": false,
560
+ "single_word": false,
561
+ "special": false
562
+ },
563
+ "50322": {
564
+ "content": "[unused37]",
565
+ "lstrip": false,
566
+ "normalized": true,
567
+ "rstrip": false,
568
+ "single_word": false,
569
+ "special": false
570
+ },
571
+ "50323": {
572
+ "content": "[unused38]",
573
+ "lstrip": false,
574
+ "normalized": true,
575
+ "rstrip": false,
576
+ "single_word": false,
577
+ "special": false
578
+ },
579
+ "50324": {
580
+ "content": "[unused39]",
581
+ "lstrip": false,
582
+ "normalized": true,
583
+ "rstrip": false,
584
+ "single_word": false,
585
+ "special": false
586
+ },
587
+ "50325": {
588
+ "content": "[unused40]",
589
+ "lstrip": false,
590
+ "normalized": true,
591
+ "rstrip": false,
592
+ "single_word": false,
593
+ "special": false
594
+ },
595
+ "50326": {
596
+ "content": "[unused41]",
597
+ "lstrip": false,
598
+ "normalized": true,
599
+ "rstrip": false,
600
+ "single_word": false,
601
+ "special": false
602
+ },
603
+ "50327": {
604
+ "content": "[unused42]",
605
+ "lstrip": false,
606
+ "normalized": true,
607
+ "rstrip": false,
608
+ "single_word": false,
609
+ "special": false
610
+ },
611
+ "50328": {
612
+ "content": "[unused43]",
613
+ "lstrip": false,
614
+ "normalized": true,
615
+ "rstrip": false,
616
+ "single_word": false,
617
+ "special": false
618
+ },
619
+ "50329": {
620
+ "content": "[unused44]",
621
+ "lstrip": false,
622
+ "normalized": true,
623
+ "rstrip": false,
624
+ "single_word": false,
625
+ "special": false
626
+ },
627
+ "50330": {
628
+ "content": "[unused45]",
629
+ "lstrip": false,
630
+ "normalized": true,
631
+ "rstrip": false,
632
+ "single_word": false,
633
+ "special": false
634
+ },
635
+ "50331": {
636
+ "content": "[unused46]",
637
+ "lstrip": false,
638
+ "normalized": true,
639
+ "rstrip": false,
640
+ "single_word": false,
641
+ "special": false
642
+ },
643
+ "50332": {
644
+ "content": "[unused47]",
645
+ "lstrip": false,
646
+ "normalized": true,
647
+ "rstrip": false,
648
+ "single_word": false,
649
+ "special": false
650
+ },
651
+ "50333": {
652
+ "content": "[unused48]",
653
+ "lstrip": false,
654
+ "normalized": true,
655
+ "rstrip": false,
656
+ "single_word": false,
657
+ "special": false
658
+ },
659
+ "50334": {
660
+ "content": "[unused49]",
661
+ "lstrip": false,
662
+ "normalized": true,
663
+ "rstrip": false,
664
+ "single_word": false,
665
+ "special": false
666
+ },
667
+ "50335": {
668
+ "content": "[unused50]",
669
+ "lstrip": false,
670
+ "normalized": true,
671
+ "rstrip": false,
672
+ "single_word": false,
673
+ "special": false
674
+ },
675
+ "50336": {
676
+ "content": "[unused51]",
677
+ "lstrip": false,
678
+ "normalized": true,
679
+ "rstrip": false,
680
+ "single_word": false,
681
+ "special": false
682
+ },
683
+ "50337": {
684
+ "content": "[unused52]",
685
+ "lstrip": false,
686
+ "normalized": true,
687
+ "rstrip": false,
688
+ "single_word": false,
689
+ "special": false
690
+ },
691
+ "50338": {
692
+ "content": "[unused53]",
693
+ "lstrip": false,
694
+ "normalized": true,
695
+ "rstrip": false,
696
+ "single_word": false,
697
+ "special": false
698
+ },
699
+ "50339": {
700
+ "content": "[unused54]",
701
+ "lstrip": false,
702
+ "normalized": true,
703
+ "rstrip": false,
704
+ "single_word": false,
705
+ "special": false
706
+ },
707
+ "50340": {
708
+ "content": "[unused55]",
709
+ "lstrip": false,
710
+ "normalized": true,
711
+ "rstrip": false,
712
+ "single_word": false,
713
+ "special": false
714
+ },
715
+ "50341": {
716
+ "content": "[unused56]",
717
+ "lstrip": false,
718
+ "normalized": true,
719
+ "rstrip": false,
720
+ "single_word": false,
721
+ "special": false
722
+ },
723
+ "50342": {
724
+ "content": "[unused57]",
725
+ "lstrip": false,
726
+ "normalized": true,
727
+ "rstrip": false,
728
+ "single_word": false,
729
+ "special": false
730
+ },
731
+ "50343": {
732
+ "content": "[unused58]",
733
+ "lstrip": false,
734
+ "normalized": true,
735
+ "rstrip": false,
736
+ "single_word": false,
737
+ "special": false
738
+ },
739
+ "50344": {
740
+ "content": "[unused59]",
741
+ "lstrip": false,
742
+ "normalized": true,
743
+ "rstrip": false,
744
+ "single_word": false,
745
+ "special": false
746
+ },
747
+ "50345": {
748
+ "content": "[unused60]",
749
+ "lstrip": false,
750
+ "normalized": true,
751
+ "rstrip": false,
752
+ "single_word": false,
753
+ "special": false
754
+ },
755
+ "50346": {
756
+ "content": "[unused61]",
757
+ "lstrip": false,
758
+ "normalized": true,
759
+ "rstrip": false,
760
+ "single_word": false,
761
+ "special": false
762
+ },
763
+ "50347": {
764
+ "content": "[unused62]",
765
+ "lstrip": false,
766
+ "normalized": true,
767
+ "rstrip": false,
768
+ "single_word": false,
769
+ "special": false
770
+ },
771
+ "50348": {
772
+ "content": "[unused63]",
773
+ "lstrip": false,
774
+ "normalized": true,
775
+ "rstrip": false,
776
+ "single_word": false,
777
+ "special": false
778
+ },
779
+ "50349": {
780
+ "content": "[unused64]",
781
+ "lstrip": false,
782
+ "normalized": true,
783
+ "rstrip": false,
784
+ "single_word": false,
785
+ "special": false
786
+ },
787
+ "50350": {
788
+ "content": "[unused65]",
789
+ "lstrip": false,
790
+ "normalized": true,
791
+ "rstrip": false,
792
+ "single_word": false,
793
+ "special": false
794
+ },
795
+ "50351": {
796
+ "content": "[unused66]",
797
+ "lstrip": false,
798
+ "normalized": true,
799
+ "rstrip": false,
800
+ "single_word": false,
801
+ "special": false
802
+ },
803
+ "50352": {
804
+ "content": "[unused67]",
805
+ "lstrip": false,
806
+ "normalized": true,
807
+ "rstrip": false,
808
+ "single_word": false,
809
+ "special": false
810
+ },
811
+ "50353": {
812
+ "content": "[unused68]",
813
+ "lstrip": false,
814
+ "normalized": true,
815
+ "rstrip": false,
816
+ "single_word": false,
817
+ "special": false
818
+ },
819
+ "50354": {
820
+ "content": "[unused69]",
821
+ "lstrip": false,
822
+ "normalized": true,
823
+ "rstrip": false,
824
+ "single_word": false,
825
+ "special": false
826
+ },
827
+ "50355": {
828
+ "content": "[unused70]",
829
+ "lstrip": false,
830
+ "normalized": true,
831
+ "rstrip": false,
832
+ "single_word": false,
833
+ "special": false
834
+ },
835
+ "50356": {
836
+ "content": "[unused71]",
837
+ "lstrip": false,
838
+ "normalized": true,
839
+ "rstrip": false,
840
+ "single_word": false,
841
+ "special": false
842
+ },
843
+ "50357": {
844
+ "content": "[unused72]",
845
+ "lstrip": false,
846
+ "normalized": true,
847
+ "rstrip": false,
848
+ "single_word": false,
849
+ "special": false
850
+ },
851
+ "50358": {
852
+ "content": "[unused73]",
853
+ "lstrip": false,
854
+ "normalized": true,
855
+ "rstrip": false,
856
+ "single_word": false,
857
+ "special": false
858
+ },
859
+ "50359": {
860
+ "content": "[unused74]",
861
+ "lstrip": false,
862
+ "normalized": true,
863
+ "rstrip": false,
864
+ "single_word": false,
865
+ "special": false
866
+ },
867
+ "50360": {
868
+ "content": "[unused75]",
869
+ "lstrip": false,
870
+ "normalized": true,
871
+ "rstrip": false,
872
+ "single_word": false,
873
+ "special": false
874
+ },
875
+ "50361": {
876
+ "content": "[unused76]",
877
+ "lstrip": false,
878
+ "normalized": true,
879
+ "rstrip": false,
880
+ "single_word": false,
881
+ "special": false
882
+ },
883
+ "50362": {
884
+ "content": "[unused77]",
885
+ "lstrip": false,
886
+ "normalized": true,
887
+ "rstrip": false,
888
+ "single_word": false,
889
+ "special": false
890
+ },
891
+ "50363": {
892
+ "content": "[unused78]",
893
+ "lstrip": false,
894
+ "normalized": true,
895
+ "rstrip": false,
896
+ "single_word": false,
897
+ "special": false
898
+ },
899
+ "50364": {
900
+ "content": "[unused79]",
901
+ "lstrip": false,
902
+ "normalized": true,
903
+ "rstrip": false,
904
+ "single_word": false,
905
+ "special": false
906
+ },
907
+ "50365": {
908
+ "content": "[unused80]",
909
+ "lstrip": false,
910
+ "normalized": true,
911
+ "rstrip": false,
912
+ "single_word": false,
913
+ "special": false
914
+ },
915
+ "50366": {
916
+ "content": "[unused81]",
917
+ "lstrip": false,
918
+ "normalized": true,
919
+ "rstrip": false,
920
+ "single_word": false,
921
+ "special": false
922
+ },
923
+ "50367": {
924
+ "content": "[unused82]",
925
+ "lstrip": false,
926
+ "normalized": true,
927
+ "rstrip": false,
928
+ "single_word": false,
929
+ "special": false
930
+ }
931
+ },
932
+ "clean_up_tokenization_spaces": true,
933
+ "cls_token": "[CLS]",
934
+ "extra_special_tokens": {},
935
+ "mask_token": "[MASK]",
936
+ "model_input_names": [
937
+ "input_ids",
938
+ "attention_mask"
939
+ ],
940
+ "model_max_length": 8192,
941
+ "pad_token": "[PAD]",
942
+ "sep_token": "[SEP]",
943
+ "tokenizer_class": "PreTrainedTokenizerFast",
944
+ "unk_token": "[UNK]"
945
+ }