Commit
·
360ec4b
1
Parent(s):
dcc302f
Update README.md
Browse files
README.md
CHANGED
@@ -20,6 +20,11 @@ You can find [here](https://huggingface.co/google/flan-t5-base?text=Premise%3A++
|
|
20 |
Our motivations for building **T5ForSequenceClassification** is that the full original T5 architecture is not needed for most NLU tasks. Indeed, NLU tasks generally do not require to generate text and thus a large decoder is unnecessary.
|
21 |
By removing the decoder we can *half the original number of parameters* (thus half the computation cost) and *efficiently optimize* the network for the given task.
|
22 |
|
|
|
|
|
|
|
|
|
|
|
23 |
## Why use T5ForSequenceClassification?
|
24 |
Models based on the [BERT](https://huggingface.co/bert-large-uncased) architecture like [RoBERTa](https://huggingface.co/roberta-large) and [DeBERTa](https://huggingface.co/microsoft/deberta-v2-xxlarge) have shown very strong performance on sequence classification task and are still widely used today.
|
25 |
However, those models only scale up to ~1.5B parameters (DeBERTa xxlarge) resulting in a limited knowledge compare to bigger models.
|
|
|
20 |
Our motivations for building **T5ForSequenceClassification** is that the full original T5 architecture is not needed for most NLU tasks. Indeed, NLU tasks generally do not require to generate text and thus a large decoder is unnecessary.
|
21 |
By removing the decoder we can *half the original number of parameters* (thus half the computation cost) and *efficiently optimize* the network for the given task.
|
22 |
|
23 |
+
## Table of Contents
|
24 |
+
|
25 |
+
1. [Why use T5ForSequenceClassification?](##why-use-t5forsequenceclassification?)
|
26 |
+
2. [T5ForClassification vs T5](##t5forclassification-vs-t5)
|
27 |
+
|
28 |
## Why use T5ForSequenceClassification?
|
29 |
Models based on the [BERT](https://huggingface.co/bert-large-uncased) architecture like [RoBERTa](https://huggingface.co/roberta-large) and [DeBERTa](https://huggingface.co/microsoft/deberta-v2-xxlarge) have shown very strong performance on sequence classification task and are still widely used today.
|
30 |
However, those models only scale up to ~1.5B parameters (DeBERTa xxlarge) resulting in a limited knowledge compare to bigger models.
|