Update README.md
Browse files
README.md
CHANGED
@@ -154,15 +154,14 @@ library_name: transformers
|
|
154 |
# Model Card for DeBERTa-v3-base-tasksource-nli
|
155 |
|
156 |
DeBERTa pretrained model jointly fine-tuned on 444 tasks of the tasksource collection https://github.com/sileod/tasksource/
|
157 |
-
|
|
|
|
|
158 |
|
|
|
159 |
Each task had a specific CLS embedding, which is dropped 10% of the time to facilitate model use without it. All multiple-choice model used the same classification layers. For classification tasks, models shared weights if their labels matched.
|
160 |
The number of examples per task was capped to 64. The model was trained for 20k steps with a batch size of 384, a peak learning rate of 2e-5.
|
161 |
|
162 |
-
You can fine-tune this model to use it for multiple-choice or any classification task (e.g. NLI) like any debertav2 model.
|
163 |
-
This model has strong zero-shot validation performance on many tasks (e.g. 70% on WNLI).
|
164 |
-
The untuned model CLS embedding also has strong linear probing performance (90% on MNLI), due to the multitask training.
|
165 |
-
|
166 |
The list of tasks is available in tasks.md
|
167 |
|
168 |
code: https://colab.research.google.com/drive/1iB4Oxl9_B5W3ZDzXoWJN-olUbqLBxgQS?usp=sharing
|
|
|
154 |
# Model Card for DeBERTa-v3-base-tasksource-nli
|
155 |
|
156 |
DeBERTa pretrained model jointly fine-tuned on 444 tasks of the tasksource collection https://github.com/sileod/tasksource/
|
157 |
+
You can fine-tune this model to use it for multiple-choice or any classification task (e.g. NLI) like any deberta model.
|
158 |
+
This model has strong zero-shot validation performance on many tasks (e.g. 70% on WNLI).
|
159 |
+
The untuned model CLS embedding also has strong linear probing performance (90% on MNLI), due to the multitask training.
|
160 |
|
161 |
+
This is the shared model with the MNLI classifier on top. Its encoder was trained on many datasets including bigbench, Anthropic/hh-rlhf... alongside many NLI and classification tasks with a SequenceClassification heads while using only one shared encoder.
|
162 |
Each task had a specific CLS embedding, which is dropped 10% of the time to facilitate model use without it. All multiple-choice model used the same classification layers. For classification tasks, models shared weights if their labels matched.
|
163 |
The number of examples per task was capped to 64. The model was trained for 20k steps with a batch size of 384, a peak learning rate of 2e-5.
|
164 |
|
|
|
|
|
|
|
|
|
165 |
The list of tasks is available in tasks.md
|
166 |
|
167 |
code: https://colab.research.google.com/drive/1iB4Oxl9_B5W3ZDzXoWJN-olUbqLBxgQS?usp=sharing
|