Update README.md
Browse files
README.md
CHANGED
@@ -62,7 +62,7 @@ For more information see: [https://arxiv.org/abs/2304.00869](https://arxiv.org/a
|
|
62 |
|
63 |
## Training configuration
|
64 |
|
65 |
-
We trained `google/
|
66 |
* GPU batch size = 6
|
67 |
* Total training epochs = 10
|
68 |
* AdamW optimizer (e = 1e−8, β1 = 0.9 and β2 = 0.0999)
|
@@ -76,7 +76,7 @@ We trained `google/mt5-small` [300 million parameters (~1.20 GB)] on the GreekSU
|
|
76 |
* padding = ‘max_length’
|
77 |
* truncation = True
|
78 |
|
79 |
-
**Note:** T5-based models use a multi-task architecture, the prefix *‘summarize
|
80 |
|
81 |
## Evaluation
|
82 |
**Approach**|**ROUGE-1**|**ROUGE-2**|**ROUGE-L**|**BERTScore**
|
|
|
62 |
|
63 |
## Training configuration
|
64 |
|
65 |
+
We trained `google/umt5-small` [300 million parameters (~1.20 GB)] on the GreekSUM train split using the following parameters:
|
66 |
* GPU batch size = 6
|
67 |
* Total training epochs = 10
|
68 |
* AdamW optimizer (e = 1e−8, β1 = 0.9 and β2 = 0.0999)
|
|
|
76 |
* padding = ‘max_length’
|
77 |
* truncation = True
|
78 |
|
79 |
+
**Note:** T5-based models use a multi-task architecture, the prefix *‘summarize: ’* was prepended in each training sample.
|
80 |
|
81 |
## Evaluation
|
82 |
**Approach**|**ROUGE-1**|**ROUGE-2**|**ROUGE-L**|**BERTScore**
|