Commit
·
08aa6a0
1
Parent(s):
03341de
Update README.md
Browse files
README.md
CHANGED
@@ -91,6 +91,7 @@ By removing the decoder we can *half the original number of parameters* (thus ha
|
|
91 |
0. [Usage](##usage)
|
92 |
1. [Why use T5ForSequenceClassification?](##why-use-t5forsequenceclassification?)
|
93 |
2. [T5ForClassification vs T5](##t5forclassification-vs-t5)
|
|
|
94 |
|
95 |
## Usage
|
96 |
**T5ForSequenceClassification** supports the task of zero-shot classification.
|
@@ -124,5 +125,13 @@ Benefits and Drawbacks:
|
|
124 |
- (**+**) No generation mistakes and faster prediction (no generation latency)
|
125 |
- (**-**) Looses text-to-text ability
|
126 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
127 |
|
128 |
Special thanks to [philschmid](https://huggingface.co/philschmid) for making a Flan-T5-xxl [checkpoint](https://huggingface.co/philschmid/flan-t5-xxl-sharded-fp16) in fp16.
|
|
|
91 |
0. [Usage](##usage)
|
92 |
1. [Why use T5ForSequenceClassification?](##why-use-t5forsequenceclassification?)
|
93 |
2. [T5ForClassification vs T5](##t5forclassification-vs-t5)
|
94 |
+
3. [Results](##results)
|
95 |
|
96 |
## Usage
|
97 |
**T5ForSequenceClassification** supports the task of zero-shot classification.
|
|
|
125 |
- (**+**) No generation mistakes and faster prediction (no generation latency)
|
126 |
- (**-**) Looses text-to-text ability
|
127 |
|
128 |
+
## Results
|
129 |
+
Results on the validation data of training tasks:
|
130 |
+
| Dataset | Accuracy | F1 |
|
131 |
+
|:-------:|:--------:|:--:|
|
132 |
+
| MNLI (m)| 1 | 0.905 |
|
133 |
+
| MNLI (mm) | 0.900 |0.900 |
|
134 |
+
| SNLI | 0.900 |0.900 |
|
135 |
+
| SciTail | 0.900 |0.900 |
|
136 |
|
137 |
Special thanks to [philschmid](https://huggingface.co/philschmid) for making a Flan-T5-xxl [checkpoint](https://huggingface.co/philschmid/flan-t5-xxl-sharded-fp16) in fp16.
|