File size: 11,587 Bytes
b500c53 fcd5f45 b500c53 fcd5f45 b500c53 3f88316 b500c53 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 |
---
license: apache-2.0
datasets:
- uonlp/CulturaX
language:
- de
tags:
- german
- electra
- teams
- culturax
- gerturax-2
---
# 🇩🇪 GERTuraX-2
This repository hosts the GERTuraX-2 model:
* GERTuraX-2 is a pretrained German encoder-only model, based on ELECTRA and pretrained with the [TEAMS](https://aclanthology.org/2021.findings-acl.219/) approach.
* It was trained on 486GB of plain text from the [CulturaX](https://huggingface.co/datasets/uonlp/CulturaX) corpus.
# Pretraining
The [TensorFlow Model Garden LMs](https://github.com/stefan-it/model-garden-lms) repo was used to train an ELECTRA
model using the very efficient [TEAMS](https://aclanthology.org/2021.findings-acl.219/) approach.
As pretraining corpus, 486GB of plain text was extracted from the [CulturaX](https://huggingface.co/datasets/uonlp/CulturaX) corpus.
GERTuraX-2 uses a 64k vocab corpus (cased) and was trained for 1M steps with a batch size of 1024 and a sequence length of 512 on a v3-32 TPU Pod.
The pretraining took 5.4 days and the TensorBoard can be found [here](../../tensorboard).
# Evaluation
GERTuraX-2 was tested on GermEval 2014 (NER), GermEval 2018 (Sentiment analysis), CoNLL-2003 (NER) and on the ScandEval benchmark.
We use the same hyper-parameters for GermEval 2014, GermEval 2018 and CoNLL-2003 as used in the [GeBERTa](https://arxiv.org/abs/2310.07321) paper (cf. Table 5) using 5 runs with different seed and report the averaged score.
## GermEval 2014
### GermEval 2014 - Original version
| Model Name | Avg. Development F1-Score | Avg. Test F1-Score |
| ----------------------------------------------------------------------------------- | ------------------------- | ------------------ |
| [GBERT Base](https://huggingface.co/deepset/gbert-base) | 87.53 ± 0.22 | 86.81 ± 0.16 |
| [GERTuraX-1](https://huggingface.co/gerturax/gerturax-1) (147GB) | 88.32 ± 0.21 | 87.18 ± 0.12 |
| [GERTuraX-2](https://huggingface.co/gerturax/gerturax-2) (486GB) | 88.58 ± 0.32 | 87.58 ± 0.15 |
| [GERTuraX-3](https://huggingface.co/gerturax/gerturax-3) (1.1TB) | 88.90 ± 0.06 | 87.84 ± 0.18 |
| [GeBERTa Base](https://huggingface.co/ikim-uk-essen/geberta-base) | 88.79 ± 0.16 | 88.03 ± 0.16 |
### GermEval 2014 - [Without Wikipedia](https://huggingface.co/datasets/stefan-it/germeval14_no_wikipedia)
| Model Name | Avg. Development F1-Score | Avg. Test F1-Score |
| ----------------------------------------------------------------------------------- | ------------------------- | ------------------ |
| [GBERT Base](https://huggingface.co/deepset/gbert-base) | 90.48 ± 0.34 | 89.05 ± 0.21 |
| [GERTuraX-1](https://huggingface.co/gerturax/gerturax-1) (147GB) | 91.27 ± 0.11 | 89.73 ± 0.27 |
| [GERTuraX-2](https://huggingface.co/gerturax/gerturax-2) (486GB) | 91.70 ± 0.28 | 89.98 ± 0.22 |
| [GERTuraX-3](https://huggingface.co/gerturax/gerturax-3) (1.1TB) | 91.75 ± 0.17 | 90.24 ± 0.27 |
| [GeBERTa Base](https://huggingface.co/ikim-uk-essen/geberta-base) | 91.74 ± 0.23 | 90.28 ± 0.21 |
## GermEval 2018
### GermEval 2018 - Fine Grained
| Model Name | Avg. Development F1-Score | Avg. Test F1-Score |
| ----------------------------------------------------------------------------------- | ------------------------- | ------------------ |
| [GBERT Base](https://huggingface.co/deepset/gbert-base) | 63.66 ± 4.08 | 51.86 ± 1.31 |
| [GERTuraX-1](https://huggingface.co/gerturax/gerturax-1) (147GB) | 62.87 ± 1.95 | 50.61 ± 0.36 |
| [GERTuraX-2](https://huggingface.co/gerturax/gerturax-2) (486GB) | 64.37 ± 1.31 | 51.02 ± 0.90 |
| [GERTuraX-3](https://huggingface.co/gerturax/gerturax-3) (1.1TB) | 66.39 ± 0.85 | 49.94 ± 2.06 |
| [GeBERTa Base](https://huggingface.co/ikim-uk-essen/geberta-base) | 65.81 ± 3.29 | 52.45 ± 0.57 |
### GermEval 2018 - Coarse Grained
| Model Name | Avg. Development F1-Score | Avg. Test F1-Score |
| ----------------------------------------------------------------------------------- | ------------------------- | ------------------ |
| [GBERT Base](https://huggingface.co/deepset/gbert-base) | 83.15 ± 1.83 | 76.39 ± 0.64 |
| [GERTuraX-1](https://huggingface.co/gerturax/gerturax-1) (147GB) | 83.72 ± 0.68 | 77.11 ± 0.59 |
| [GERTuraX-2](https://huggingface.co/gerturax/gerturax-2) (486GB) | 84.51 ± 0.88 | 78.07 ± 0.91 |
| [GERTuraX-3](https://huggingface.co/gerturax/gerturax-3) (1.1TB) | 84.33 ± 1.48 | 78.44 ± 0.74 |
| [GeBERTa Base](https://huggingface.co/ikim-uk-essen/geberta-base) | 83.54 ± 1.27 | 78.36 ± 0.79 |
## CoNLL-2003 - German, Revised
| Model Name | Avg. Development F1-Score | Avg. Test F1-Score |
| ----------------------------------------------------------------------------------- | ------------------------- | ------------------ |
| [GBERT Base](https://huggingface.co/deepset/gbert-base) | 92.15 ± 0.10 | 88.73 ± 0.21 |
| [GERTuraX-1](https://huggingface.co/gerturax/gerturax-1) (147GB) | 92.32 ± 0.14 | 90.09 ± 0.12 |
| [GERTuraX-2](https://huggingface.co/gerturax/gerturax-2) (486GB) | 92.75 ± 0.20 | 90.15 ± 0.14 |
| [GERTuraX-3](https://huggingface.co/gerturax/gerturax-3) (1.1TB) | 92.77 ± 0.28 | 90.83 ± 0.16 |
| [GeBERTa Base](https://huggingface.co/ikim-uk-essen/geberta-base) | 92.87 ± 0.21 | 90.94 ± 0.24 |
## ScandEval
We use v12.10.5 of [ScandEval](https://github.com/ScandEval/ScandEval) to evaluate on the following tasks:
* SB10k
* ScaLA-De
* GermanQuAD
The package can be installed via:
```bash
$ pip3 install "scandeval[all]==12.10.5"
```
### Results
#### SB10k
Evaluations on the SB10k dataset can be started like:
```bash
$ scandeval --model "deepset/gbert-base" --task sentiment-classification --language de
$ scandeval --model "ikim-uk-essen/geberta-base" --task sentiment-classification --language de
$ scandeval --model "gerturax/gerturax-1" --task sentiment-classification --language de
$ scandeval --model "gerturax/gerturax-2" --task sentiment-classification --language de
$ scandeval --model "gerturax/gerturax-3" --task sentiment-classification --language de
```
| Model Name | Matthew's CC | Macro F1-Score |
| ----------------------------------------------------------------------------------- | ------------------------- | ------------------ |
| [GBERT Base](https://huggingface.co/deepset/gbert-base) | 59.58 ± 1.80 | 72.98 ± 1.20 |
| [GERTuraX-1](https://huggingface.co/gerturax/gerturax-1) (147GB) | 61.56 ± 2.58 | 74.18 ± 1.77 |
| [GERTuraX-2](https://huggingface.co/gerturax/gerturax-2) (486GB) | 65.24 ± 1.77 | 76.55 ± 1.22 |
| [GERTuraX-3](https://huggingface.co/gerturax/gerturax-3) (1.1TB) | 64.33 ± 2.17 | 75.99 ± 1.40 |
| [GeBERTa Base](https://huggingface.co/ikim-uk-essen/geberta-base) | 59.52 ± 2.14 | 72.76 ± 1.50 |
#### ScaLA-De
Evaluations on the ScaLA-De dataset can be started like:
```bash
$ scandeval --model "deepset/gbert-base" --task linguistic-acceptability --language de
$ scandeval --model "ikim-uk-essen/geberta-base" --task linguistic-acceptability --language de
$ scandeval --model "gerturax/gerturax-1" --task linguistic-acceptability --language de
$ scandeval --model "gerturax/gerturax-2" --task linguistic-acceptability --language de
$ scandeval --model "gerturax/gerturax-3" --task linguistic-acceptability --language de
```
| Model Name | Matthew's CC | Macro F1-Score |
| ----------------------------------------------------------------------------------- | ------------------------- | ------------------ |
| [GBERT Base](https://huggingface.co/deepset/gbert-base) | 52.23 ± 4.34 | 73.90 ± 2.68 |
| [GERTuraX-1](https://huggingface.co/gerturax/gerturax-1) (147GB) | 74.55 ± 1.28 | 86.88 ± 0.75 |
| [GERTuraX-2](https://huggingface.co/gerturax/gerturax-2) (486GB) | 75.83 ± 2.85 | 87.59 ± 1.57 |
| [GERTuraX-3](https://huggingface.co/gerturax/gerturax-3) (1.1TB) | 78.24 ± 1.25 | 88.83 ± 0.63 |
| [GeBERTa Base](https://huggingface.co/ikim-uk-essen/geberta-base) | 59.70 ± 11.64 | 78.44 ± 6.12 |
#### GermanQuAD
```bash
$ scandeval --model "deepset/gbert-base" --task question-answering --language de
$ scandeval --model "ikim-uk-essen/geberta-base" --task question-answering --language de
$ scandeval --model "gerturax/gerturax-1" --task question-answering --language de
$ scandeval --model "gerturax/gerturax-2" --task question-answering --language de
$ scandeval --model "gerturax/gerturax-3" --task question-answering --language de
```
| Model Name | Em | F1-Score |
| ----------------------------------------------------------------------------------- | ------------------------- | ------------------ |
| [GBERT Base](https://huggingface.co/deepset/gbert-base) | 12.62 ± 2.20 | 29.62 ± 3.86 |
| [GERTuraX-1](https://huggingface.co/gerturax/gerturax-1) (147GB) | 27.24 ± 1.05 | 52.01 ± 1.10 |
| [GERTuraX-2](https://huggingface.co/gerturax/gerturax-2) (486GB) | 29.54 ± 1.05 | 55.12 ± 0.92 |
| [GERTuraX-3](https://huggingface.co/gerturax/gerturax-3) (1.1TB) | 28.49 ± 1.21 | 54.83 ± 1.26 |
| [GeBERTa Base](https://huggingface.co/ikim-uk-essen/geberta-base) | 28.81 ± 1.77 | 53.27 ± 1.92 |
# ❤️ Acknowledgements
GERTuraX is the outcome of the last 12 months of working with TPUs from the awesome [TRC program](https://sites.research.google/trc/about/)
and the [TensorFlow Model Garden](https://github.com/tensorflow/models) library.
Many thanks for providing TPUs!
Made from Bavarian Oberland with ❤️ and 🥨. |