sileod commited on
Commit
607f59c
1 Parent(s): b0d57dd

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +44 -174
README.md CHANGED
@@ -1,4 +1,14 @@
1
  ---
 
 
 
 
 
 
 
 
 
 
2
  datasets:
3
  - hellaswag
4
  - ag_news
@@ -134,204 +144,64 @@ datasets:
134
  - winogrande
135
  - relbert/lexical_relation_classification
136
  - metaeval/linguisticprobing
 
 
 
137
  ---
138
 
139
- # Model Card for Model ID
140
 
141
- <!-- Provide a quick summary of what the model is/does. -->
 
142
 
143
- # Table of Contents
 
144
 
145
- 1. [Model Details](#model-details)
146
- 2. [Uses](#uses)
147
- 3. [Bias, Risks, and Limitations](#bias-risks-and-limitations)
148
- 4. [Training Details](#training-details)
149
- 5. [Evaluation](#evaluation)
150
- 6. [Model Examination](#model-examination-optional)
151
- 7. [Environmental Impact](#environmental-impact)
152
- 8. [Technical Specifications](#technical-specifications-optional)
153
- 9. [Citation](#citation-optional)
154
- 10. [Glossary](#glossary-optional)
155
- 11. [More Information](#more-information-optional)
156
- 12. [Model Card Authors](#model-card-authors-optional)
157
- 13. [Model Card Contact](#model-card-contact)
158
- 14. [How To Get Started With the Model](#how-to-get-started-with-the-model)
159
 
 
160
 
161
- # Model Details
162
 
163
- ## Model Description
164
-
165
- <!-- Provide a longer summary of what this model is. -->
166
-
167
-
168
-
169
- - **Developed by:** [More Information Needed]
170
- - **Shared by [optional]:** [More Information Needed]
171
- - **Model type:** [More Information Needed]
172
- - **Language(s) (NLP):** [More Information Needed]
173
- - **License:** [More Information Needed]
174
- - **Related Models [optional]:** [More Information Needed]
175
- - **Parent Model [optional]:** [More Information Needed]
176
- - **Resources for more information:** [More Information Needed]
177
-
178
- # Uses
179
-
180
- <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
181
-
182
- ## Direct Use
183
-
184
- <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
185
-
186
- [More Information Needed]
187
-
188
- ## Downstream Use [optional]
189
-
190
- <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
191
-
192
- [More Information Needed]
193
-
194
- ## Out-of-Scope Use
195
-
196
- <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
197
-
198
- [More Information Needed]
199
-
200
- # Bias, Risks, and Limitations
201
-
202
- <!-- This section is meant to convey both technical and sociotechnical limitations. -->
203
-
204
- [More Information Needed]
205
-
206
- ## Recommendations
207
-
208
- <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
209
-
210
- Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recomendations.
211
-
212
- # Training Details
213
-
214
- ## Training Data
215
-
216
- <!-- This should link to a Data Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
217
-
218
- [More Information Needed]
219
-
220
- ## Training Procedure [optional]
221
-
222
- <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
223
-
224
- ### Preprocessing
225
-
226
- [More Information Needed]
227
-
228
- ### Speeds, Sizes, Times
229
-
230
- <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
231
-
232
- [More Information Needed]
233
-
234
- # Evaluation
235
-
236
- <!-- This section describes the evaluation protocols and provides the results. -->
237
-
238
- ## Testing Data, Factors & Metrics
239
-
240
- ### Testing Data
241
-
242
- <!-- This should link to a Data Card if possible. -->
243
-
244
- [More Information Needed]
245
-
246
- ### Factors
247
-
248
- <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
249
-
250
- [More Information Needed]
251
-
252
- ### Metrics
253
-
254
- <!-- These are the evaluation metrics being used, ideally with a description of why. -->
255
-
256
- [More Information Needed]
257
-
258
- ## Results
259
-
260
- [More Information Needed]
261
-
262
- # Model Examination [optional]
263
-
264
- <!-- Relevant interpretability work for the model goes here -->
265
-
266
- [More Information Needed]
267
-
268
- # Environmental Impact
269
-
270
- <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
271
-
272
- Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
273
-
274
- - **Hardware Type:** [More Information Needed]
275
- - **Hours used:** [More Information Needed]
276
- - **Cloud Provider:** [More Information Needed]
277
- - **Compute Region:** [More Information Needed]
278
- - **Carbon Emitted:** [More Information Needed]
279
-
280
- # Technical Specifications [optional]
281
-
282
- ## Model Architecture and Objective
283
 
284
- [More Information Needed]
 
285
 
286
- ## Compute Infrastructure
287
 
288
- [More Information Needed]
289
 
290
- ### Hardware
 
291
 
292
- [More Information Needed]
 
 
293
 
294
- ### Software
295
 
296
- [More Information Needed]
297
 
298
  # Citation [optional]
299
 
300
- <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
301
-
302
  **BibTeX:**
303
 
304
- [More Information Needed]
305
-
306
- **APA:**
307
-
308
- [More Information Needed]
309
-
310
- # Glossary [optional]
 
 
 
311
 
312
- <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
313
-
314
- [More Information Needed]
315
-
316
- # More Information [optional]
317
-
318
- [More Information Needed]
319
-
320
- # Model Card Authors [optional]
321
-
322
- [More Information Needed]
323
 
324
  # Model Card Contact
325
 
326
- [More Information Needed]
327
-
328
- # How to Get Started with the Model
329
-
330
- Use the code below to get started with the model.
331
-
332
- <details>
333
- <summary> Click to expand </summary>
334
 
335
- [More Information Needed]
336
 
337
  </details>
 
1
  ---
2
+ license: apache-2.0
3
+ language: en
4
+ tags:
5
+ - deberta-v3-base
6
+ - text-classification
7
+ - nli
8
+ - natural-language-inference
9
+ - multitask
10
+ - extreme-mtl
11
+ pipeline_tag: zero-shot-classification
12
  datasets:
13
  - hellaswag
14
  - ag_news
 
144
  - winogrande
145
  - relbert/lexical_relation_classification
146
  - metaeval/linguisticprobing
147
+ metrics:
148
+ - accuracy
149
+ library_name: transformers
150
  ---
151
 
152
+ # Model Card for DeBERTa-v3-base-tasksource-nli
153
 
154
+ DeBERTa model jointly fine-tuned on 444 tasks of the tasksource collection https://github.com/sileod/tasksource/
155
+ This is the model with the MNLI classifier on top. Its encoder was trained on many datasets including bigbench, Anthropic/hh-rlhf... alongside many NLI and classification tasks with a SequenceClassification heads while using only one shared encoder.
156
 
157
+ Each task had a specific CLS embedding, which is dropped 10% of the time to facilitate model use without it. All multiple-choice model used the same classification layers. For classification tasks, models shared weights if their labels matched.
158
+ The number of examples per task was capped to 64. The model was trained for 20k steps with a batch size of 384, a peak learning rate of 2e-5.
159
 
160
+ You can fine-tune this model to use it for multiple-choice or any classification task (e.g. NLI) like any debertav2 model.
161
+ This model has strong validation performance on many tasks (e.g. 70% on WNLI).
 
 
 
 
 
 
 
 
 
 
 
 
162
 
163
+ The list of tasks is available in tasks.md
164
 
165
+ code: https://colab.research.google.com/drive/1iB4Oxl9_B5W3ZDzXoWJN-olUbqLBxgQS?usp=sharing
166
 
167
+ ### Software
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
168
 
169
+ https://github.com/sileod/tasknet/
170
+ Training took 3 days on 24GB gpu.
171
 
172
+ ## Model Recycling
173
 
174
+ [Evaluation on 36 datasets](https://ibm.github.io/model-recycling/model_gain_chart?avg=1.41&mnli_lp=nan&20_newsgroup=0.63&ag_news=0.46&amazon_reviews_multi=-0.40&anli=0.94&boolq=2.55&cb=10.71&cola=0.49&copa=10.60&dbpedia=0.10&esnli=-0.25&financial_phrasebank=1.31&imdb=-0.17&isear=0.63&mnli=0.42&mrpc=-0.23&multirc=1.73&poem_sentiment=0.77&qnli=0.12&qqp=-0.05&rotten_tomatoes=0.67&rte=2.13&sst2=0.01&sst_5bins=-0.02&stsb=1.39&trec_coarse=0.24&trec_fine=0.18&tweet_ev_emoji=0.62&tweet_ev_emotion=0.43&tweet_ev_hate=1.84&tweet_ev_irony=1.43&tweet_ev_offensive=0.17&tweet_ev_sentiment=0.08&wic=-1.78&wnli=3.03&wsc=9.95&yahoo_answers=0.17&model_name=sileod%2Fdeberta-v3-base_tasksource-420&base_name=microsoft%2Fdeberta-v3-base) using sileod/deberta-v3-base_tasksource-420 as a base model yields average score of 80.45 in comparison to 79.04 by microsoft/deberta-v3-base.
175
 
176
+ An earlier (weaker) version model is ranked 1st among all tested models for the microsoft/deberta-v3-base architecture as of 10/01/2023
177
+ Results:
178
 
179
+ | 20_newsgroup | ag_news | amazon_reviews_multi | anli | boolq | cb | cola | copa | dbpedia | esnli | financial_phrasebank | imdb | isear | mnli | mrpc | multirc | poem_sentiment | qnli | qqp | rotten_tomatoes | rte | sst2 | sst_5bins | stsb | trec_coarse | trec_fine | tweet_ev_emoji | tweet_ev_emotion | tweet_ev_hate | tweet_ev_irony | tweet_ev_offensive | tweet_ev_sentiment | wic | wnli | wsc | yahoo_answers |
180
+ |---------------:|----------:|-----------------------:|--------:|--------:|--------:|--------:|-------:|----------:|--------:|-----------------------:|-------:|--------:|--------:|--------:|----------:|-----------------:|--------:|--------:|------------------:|--------:|--------:|------------:|--------:|--------------:|------------:|-----------------:|-------------------:|----------------:|-----------------:|---------------------:|---------------------:|--------:|--------:|--------:|----------------:|
181
+ | 87.042 | 90.9 | 66.46 | 59.7188 | 85.5352 | 85.7143 | 87.0566 | 69 | 79.5333 | 91.6735 | 85.8 | 94.324 | 72.4902 | 90.2055 | 88.9706 | 63.9851 | 87.5 | 93.6299 | 91.7363 | 91.0882 | 84.4765 | 95.0688 | 56.9683 | 91.6654 | 98 | 91.2 | 46.814 | 84.3772 | 58.0471 | 81.25 | 85.2326 | 71.8821 | 69.4357 | 73.2394 | 74.0385 | 72.2 |
182
 
 
183
 
184
+ For more information, see: [Model Recycling](https://ibm.github.io/model-recycling/)
185
 
186
  # Citation [optional]
187
 
 
 
188
  **BibTeX:**
189
 
190
+ ```bib
191
+ @misc{sileod23-tasksource,
192
+ author = {Sileo, Damien},
193
+ doi = {10.5281/zenodo.7473446},
194
+ month = {01},
195
+ title = {{tasksource: preprocessings for reproducibility and multitask-learning}},
196
+ url = {https://github.com/sileod/tasksource},
197
+ version = {1.5.0},
198
+ year = {2023}}
199
+ ```
200
 
 
 
 
 
 
 
 
 
 
 
 
201
 
202
  # Model Card Contact
203
 
204
 
 
 
 
 
 
 
205
 
 
206
 
207
  </details>