sileod commited on
Commit
edd4fd8
1 Parent(s): 80c5b7b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -153,14 +153,14 @@ library_name: transformers
153
 
154
  # Model Card for DeBERTa-v3-base-tasksource-nli
155
 
156
- DeBERTa pretrained model jointly fine-tuned on 444 tasks of the tasksource collection https://github.com/sileod/tasksource/
157
- You can fine-tune this model to use it for multiple-choice or any classification task (e.g. NLI) like any deberta model.
158
  This model has strong zero-shot validation performance on many tasks (e.g. 70% on WNLI).
159
  The untuned model CLS embedding also has strong linear probing performance (90% on MNLI), due to the multitask training.
160
 
161
  This is the shared model with the MNLI classifier on top. Its encoder was trained on many datasets including bigbench, Anthropic/hh-rlhf... alongside many NLI and classification tasks with a SequenceClassification heads while using only one shared encoder.
162
  Each task had a specific CLS embedding, which is dropped 10% of the time to facilitate model use without it. All multiple-choice model used the same classification layers. For classification tasks, models shared weights if their labels matched.
163
- The number of examples per task was capped to 64. The model was trained for 20k steps with a batch size of 384, a peak learning rate of 2e-5.
164
 
165
  The list of tasks is available in tasks.md
166
 
 
153
 
154
  # Model Card for DeBERTa-v3-base-tasksource-nli
155
 
156
+ DeBERTa pretrained model jointly fine-tuned on 444 tasks of the [tasksource collection](https://github.com/sileod/tasksource/)
157
+ You can fine-tune this model to use it for any classification or multiple-choice task, like any deberta model.
158
  This model has strong zero-shot validation performance on many tasks (e.g. 70% on WNLI).
159
  The untuned model CLS embedding also has strong linear probing performance (90% on MNLI), due to the multitask training.
160
 
161
  This is the shared model with the MNLI classifier on top. Its encoder was trained on many datasets including bigbench, Anthropic/hh-rlhf... alongside many NLI and classification tasks with a SequenceClassification heads while using only one shared encoder.
162
  Each task had a specific CLS embedding, which is dropped 10% of the time to facilitate model use without it. All multiple-choice model used the same classification layers. For classification tasks, models shared weights if their labels matched.
163
+ The number of examples per task was capped to 64k. The model was trained for 20k steps with a batch size of 384, and a peak learning rate of 2e-5.
164
 
165
  The list of tasks is available in tasks.md
166