mikeee commited on
Commit
589bce3
·
verified ·
1 Parent(s): f8ae257

Add SetFit model

Browse files
.gitattributes CHANGED
@@ -33,3 +33,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
37
+ unigram.json filter=lfs diff=lfs merge=lfs -text
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 384,
3
+ "pooling_mode_cls_token": false,
4
+ "pooling_mode_mean_tokens": true,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,221 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
3
+ library_name: setfit
4
+ metrics:
5
+ - accuracy
6
+ pipeline_tag: text-classification
7
+ tags:
8
+ - setfit
9
+ - sentence-transformers
10
+ - text-classification
11
+ - generated_from_setfit_trainer
12
+ widget:
13
+ - text: 数字孪生包含三个核心元素:物理系统(产品、流程、网络)、代表它的虚拟模型以及实时更新模型的数据连接。虚拟模型反映了物理系统的当前状态和行为,并与来自传感器和物联网设备的数据持续同步。这种设置允许数字孪生模拟和预测物理系统在各种条件下的性能。将这三个组件结合在一起需要几项关键技术。首先,数据的收集和使用涉及云计算以及用于存储和处理的平台。其次,需要人工智能和机器学习来启用提供高级分析和准确虚拟模型的模拟模型。最后,增强现实和虚拟现实实现了数字模型和物理系统之间的高级可视化和交互。
14
+ - text: 这项技术使 BMO 能够在模型内完成远程站点评估和操作测试,而不会中断服务。结果:15 个月内节省了 50 多万美元,在 503 个地点收回了 6,000
15
+ 个调查小时,并集中了分行资源和文档。
16
+ - text: 在银行业,数字孪生可能看起来像是增强的情景分析。如果您是这么想的,我们不会责怪您。但关键的区别就在这里:数据。传统情景分析依赖于静态数据,而数字孪生则使用实时动态数据并促进双向数据流。这意味着数字孪生可以利用其产生的洞察并触发更改以优化其复制的物理系统,而情景分析仅提供必须单独审查和采取行动的输出。
17
+ - text: '## 数字孪生解决了什么问题?'
18
+ - text: 数字孪生的使用始于 20 世纪 60 年代,当时 NASA 使用孪生模型在太空任务期间监控和调整航天器。最近,拜登政府宣布投资 2.85 亿美元用于半导体制造的数字孪生技术,因为该技术有潜力提高美国的效率、创新和弹性。
19
+ inference: true
20
+ model-index:
21
+ - name: SetFit with sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
22
+ results:
23
+ - task:
24
+ type: text-classification
25
+ name: Text Classification
26
+ dataset:
27
+ name: Unknown
28
+ type: unknown
29
+ split: test
30
+ metrics:
31
+ - type: accuracy
32
+ value: 0.8571428571428571
33
+ name: Accuracy
34
+ ---
35
+
36
+ # SetFit with sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
37
+
38
+ This is a [SetFit](https://github.com/huggingface/setfit) model that can be used for Text Classification. This SetFit model uses [sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2](https://huggingface.co/sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2) as the Sentence Transformer embedding model. A [LogisticRegression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html) instance is used for classification.
39
+
40
+ The model has been trained using an efficient few-shot learning technique that involves:
41
+
42
+ 1. Fine-tuning a [Sentence Transformer](https://www.sbert.net) with contrastive learning.
43
+ 2. Training a classification head with features from the fine-tuned Sentence Transformer.
44
+
45
+ ## Model Details
46
+
47
+ ### Model Description
48
+ - **Model Type:** SetFit
49
+ - **Sentence Transformer body:** [sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2](https://huggingface.co/sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2)
50
+ - **Classification head:** a [LogisticRegression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html) instance
51
+ - **Maximum Sequence Length:** 128 tokens
52
+ - **Number of Classes:** 14 classes
53
+ <!-- - **Training Dataset:** [Unknown](https://huggingface.co/datasets/unknown) -->
54
+ <!-- - **Language:** Unknown -->
55
+ <!-- - **License:** Unknown -->
56
+
57
+ ### Model Sources
58
+
59
+ - **Repository:** [SetFit on GitHub](https://github.com/huggingface/setfit)
60
+ - **Paper:** [Efficient Few-Shot Learning Without Prompts](https://arxiv.org/abs/2209.11055)
61
+ - **Blogpost:** [SetFit: Efficient Few-Shot Learning Without Prompts](https://huggingface.co/blog/setfit)
62
+
63
+ ### Model Labels
64
+ | Label | Examples |
65
+ |:------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
66
+ | 6 | <ul><li>'## How do they work?'</li></ul> |
67
+ | 7 | <ul><li>'A digital twin comprises three core elements: the physical system (product, process, network), a virtual model representing it and a data connection that updates the model in real time. The virtual model mirrors the physical system’s current state and behavior, continuously synchronized with data from sensors and internet of things devices. This setup allows the digital twin to simulate and predict the physical system’s performance under various conditions. Bringing all three components together requires several key technologies. First, the collection and use of data involves cloud computing and platforms for storage and processing. Second, AI and machine learning are needed to enable simulation models that provide advanced analytics and accurate virtual models. Lastly, augmented reality and virtual reality enable advanced visualization and interactions between the digital model and the physical system.'</li></ul> |
68
+ | 9 | <ul><li>'While data is the mantra of our modern age, data sets taken in isolation are of limited value because they tend to be sparse, noisy, and often indirect. Because systems exist across a web of components, any micro change results in a ripple effect, making accurately replicating a system extremely difficult. In banking, digital twin technology’s true potential is harnessed when integrated with a bank’s proprietary knowledge along with an inflow of external stimuli into decision-making models. With data flowing from multiple channels, using a mirrored environment enables precise contingency and incident response plans. When changes are made, other parts can adapt accordingly, simplifying coordination with business units and third parties. For example, a digital twin of a bank’s technology stack can predict outcomes of certain technology changes with the potential to evolve based on results from prior simulation runs. Digital twins can also mitigate risk across evolving fraud vectors through intelligent, comprehensive, data-driven strategic planning.'</li></ul> |
69
+ | 3 | <ul><li>'So-called “digital twins” are dynamic, virtual replicas of complex systems. Organizations often use them for scenario planning because they blend real-world elements with simulations and a constant flow of data, helping evaluate the consequences of different decisions. For example, when BMO acquired 503 Bank of the West branches in 2023, it used Matterport’s capture services to create dimensionally accurate 3D digital twins of all the branch locations within three months.'</li></ul> |
70
+ | 0 | <ul><li>'https://bankingjournal.aba.com/2024/08/precision-banking-the-digital-twin-advantage/ dt'</li></ul> |
71
+ | 11 | <ul><li>'Let’s look at a few potential use cases for banks:'</li></ul> |
72
+ | 13 | <ul><li>'**Digital financial twin.** This is an approach where digital twins could be used to precisely map financial and nonfinancial metrics across the life cycle of a bank product. The digital twin would be set up to link metrics related to the product’s service, partners, customers, and employees, resulting in efficient and quality decision-making. To go further, the digital twin would combine with real-time data from an enterprise resource planning system to ensure the highest level of resource optimization, drive sustainability and accelerate product development.'</li></ul> |
73
+ | 10 | <ul><li>'In the banking industry, digital twins may seem like enhanced scenario analysis. And if this is what you’re thinking, we don’t blame you. But here is where the key difference lies: data. Traditional scenario analysis relies on static data while digital twins use real-time dynamic data and facilitate bidirectional data flow. This means that a digital twin can take insights it produced and trigger changes to optimize the physical system it replicates, whereas scenario analysis merely provides an output that must be reviewed and acted upon separately.'</li></ul> |
74
+ | 5 | <ul><li>'The use of digital twins began in the 1960s when NASA used twin models to monitor and adjust spacecraft during space missions. Recently, the Biden administration announced a $285 million investment in digital twin technology for semiconductor manufacturing based on its potential to enhance efficiency, innovation, and resilience in the U.S.'</li></ul> |
75
+ | 2 | <ul><li>'img__'</li></ul> |
76
+ | 4 | <ul><li>'This technology enabled BMO to complete remote site assessments and operational tests within the model without service disruptions. The results: Over $500,000 saved in 15 months, 6,000 survey hours recouped across 503 locations, and branch resources and documentation centralized.'</li></ul> |
77
+ | 12 | <ul><li>'**Stress testing.** A digital twin could enable banks to simulate various scenarios, such as economic downturns, market fluctuations, or operational disruptions, to assess their resilience and performance under stress. Banks could identify weaknesses and mitigate risks preemptively by inputting diverse parameters to the digital twin. Add real-time insights and your bank can continuously adjust strategies that bolster resilience and stability.'</li></ul> |
78
+ | 1 | <ul><li>'# Precision banking: The ‘digital twin’ advantage'</li></ul> |
79
+ | 8 | <ul><li>'## What problem do digital twins solve?'</li></ul> |
80
+
81
+ ## Evaluation
82
+
83
+ ### Metrics
84
+ | Label | Accuracy |
85
+ |:--------|:---------|
86
+ | **all** | 0.8571 |
87
+
88
+ ## Uses
89
+
90
+ ### Direct Use for Inference
91
+
92
+ First install the SetFit library:
93
+
94
+ ```bash
95
+ pip install setfit
96
+ ```
97
+
98
+ Then you can load this model and run inference.
99
+
100
+ ```python
101
+ from setfit import SetFitModel
102
+
103
+ # Download from the 🤗 Hub
104
+ model = SetFitModel.from_pretrained("mikeee/setfit-model")
105
+ # Run inference
106
+ preds = model("## 数字孪生解决了什么问题?")
107
+ ```
108
+
109
+ <!--
110
+ ### Downstream Use
111
+
112
+ *List how someone could finetune this model on their own dataset.*
113
+ -->
114
+
115
+ <!--
116
+ ### Out-of-Scope Use
117
+
118
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
119
+ -->
120
+
121
+ <!--
122
+ ## Bias, Risks and Limitations
123
+
124
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
125
+ -->
126
+
127
+ <!--
128
+ ### Recommendations
129
+
130
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
131
+ -->
132
+
133
+ ## Training Details
134
+
135
+ ### Training Set Metrics
136
+ | Training set | Min | Median | Max |
137
+ |:-------------|:----|:--------|:----|
138
+ | Word count | 1 | 50.7143 | 156 |
139
+
140
+ | Label | Training Sample Count |
141
+ |:------|:----------------------|
142
+ | 0 | 1 |
143
+ | 1 | 1 |
144
+ | 2 | 1 |
145
+ | 3 | 1 |
146
+ | 4 | 1 |
147
+ | 5 | 1 |
148
+ | 6 | 1 |
149
+ | 7 | 1 |
150
+ | 8 | 1 |
151
+ | 9 | 1 |
152
+ | 10 | 1 |
153
+ | 11 | 1 |
154
+ | 12 | 1 |
155
+ | 13 | 1 |
156
+
157
+ ### Training Hyperparameters
158
+ - batch_size: (8, 8)
159
+ - num_epochs: (1, 1)
160
+ - max_steps: -1
161
+ - sampling_strategy: oversampling
162
+ - num_iterations: 4
163
+ - body_learning_rate: (2e-05, 2e-05)
164
+ - head_learning_rate: 2e-05
165
+ - loss: CosineSimilarityLoss
166
+ - distance_metric: cosine_distance
167
+ - margin: 0.25
168
+ - end_to_end: False
169
+ - use_amp: False
170
+ - warmup_proportion: 0.1
171
+ - seed: 42
172
+ - eval_max_steps: -1
173
+ - load_best_model_at_end: False
174
+
175
+ ### Training Results
176
+ | Epoch | Step | Training Loss | Validation Loss |
177
+ |:------:|:----:|:-------------:|:---------------:|
178
+ | 0.1429 | 1 | 0.0039 | - |
179
+
180
+ ### Framework Versions
181
+ - Python: 3.10.12
182
+ - SetFit: 1.0.3
183
+ - Sentence Transformers: 3.0.1
184
+ - Transformers: 4.39.0
185
+ - PyTorch: 2.3.1+cu121
186
+ - Datasets: 2.21.0
187
+ - Tokenizers: 0.15.2
188
+
189
+ ## Citation
190
+
191
+ ### BibTeX
192
+ ```bibtex
193
+ @article{https://doi.org/10.48550/arxiv.2209.11055,
194
+ doi = {10.48550/ARXIV.2209.11055},
195
+ url = {https://arxiv.org/abs/2209.11055},
196
+ author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren},
197
+ keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
198
+ title = {Efficient Few-Shot Learning Without Prompts},
199
+ publisher = {arXiv},
200
+ year = {2022},
201
+ copyright = {Creative Commons Attribution 4.0 International}
202
+ }
203
+ ```
204
+
205
+ <!--
206
+ ## Glossary
207
+
208
+ *Clearly define terms in order to be accessible across audiences.*
209
+ -->
210
+
211
+ <!--
212
+ ## Model Card Authors
213
+
214
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
215
+ -->
216
+
217
+ <!--
218
+ ## Model Card Contact
219
+
220
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
221
+ -->
config.json ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2",
3
+ "architectures": [
4
+ "BertModel"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "classifier_dropout": null,
8
+ "gradient_checkpointing": false,
9
+ "hidden_act": "gelu",
10
+ "hidden_dropout_prob": 0.1,
11
+ "hidden_size": 384,
12
+ "initializer_range": 0.02,
13
+ "intermediate_size": 1536,
14
+ "layer_norm_eps": 1e-12,
15
+ "max_position_embeddings": 512,
16
+ "model_type": "bert",
17
+ "num_attention_heads": 12,
18
+ "num_hidden_layers": 12,
19
+ "pad_token_id": 0,
20
+ "position_embedding_type": "absolute",
21
+ "torch_dtype": "float32",
22
+ "transformers_version": "4.39.0",
23
+ "type_vocab_size": 2,
24
+ "use_cache": true,
25
+ "vocab_size": 250037
26
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.0.1",
4
+ "transformers": "4.39.0",
5
+ "pytorch": "2.3.1+cu121"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": null
10
+ }
config_setfit.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "normalize_embeddings": false,
3
+ "labels": null
4
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6e5cd4937246ff7e7ae80accd13e025e51fde55991035397d9610222775d8b66
3
+ size 470637416
model_head.pkl ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9b853608c506394fd6484cf69e5e68e65e6fcf2293f5ed45da3bc4d4a6875434
3
+ size 44071
modules.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ }
14
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 128,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "cls_token": {
10
+ "content": "<s>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "eos_token": {
17
+ "content": "</s>",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "mask_token": {
24
+ "content": "<mask>",
25
+ "lstrip": true,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "pad_token": {
31
+ "content": "<pad>",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ },
37
+ "sep_token": {
38
+ "content": "</s>",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false
43
+ },
44
+ "unk_token": {
45
+ "content": "<unk>",
46
+ "lstrip": false,
47
+ "normalized": false,
48
+ "rstrip": false,
49
+ "single_word": false
50
+ }
51
+ }
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fa685fc160bbdbab64058d4fc91b60e62d207e8dc60b9af5c002c5ab946ded00
3
+ size 17083009
tokenizer_config.json ADDED
@@ -0,0 +1,64 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "<s>",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "1": {
12
+ "content": "<pad>",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "2": {
20
+ "content": "</s>",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "3": {
28
+ "content": "<unk>",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "250001": {
36
+ "content": "<mask>",
37
+ "lstrip": true,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "bos_token": "<s>",
45
+ "clean_up_tokenization_spaces": true,
46
+ "cls_token": "<s>",
47
+ "do_lower_case": true,
48
+ "eos_token": "</s>",
49
+ "mask_token": "<mask>",
50
+ "max_length": 128,
51
+ "model_max_length": 128,
52
+ "pad_to_multiple_of": null,
53
+ "pad_token": "<pad>",
54
+ "pad_token_type_id": 0,
55
+ "padding_side": "right",
56
+ "sep_token": "</s>",
57
+ "stride": 0,
58
+ "strip_accents": null,
59
+ "tokenize_chinese_chars": true,
60
+ "tokenizer_class": "BertTokenizer",
61
+ "truncation_side": "right",
62
+ "truncation_strategy": "longest_first",
63
+ "unk_token": "<unk>"
64
+ }
unigram.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:da145b5e7700ae40f16691ec32a0b1fdc1ee3298db22a31ea55f57a966c4a65d
3
+ size 14763260