|
--- |
|
license: apache-2.0 |
|
datasets: |
|
- uonlp/CulturaX |
|
language: |
|
- de |
|
tags: |
|
- german |
|
- electra |
|
- teams |
|
- culturax |
|
- gerturax-2 |
|
--- |
|
|
|
# 🇩🇪 GERTuraX-2 |
|
|
|
This repository hosts the GERTuraX-2 model: |
|
|
|
* GERTuraX-2 is a pretrained German encoder-only model, based on ELECTRA and pretrained with the [TEAMS](https://aclanthology.org/2021.findings-acl.219/) approach. |
|
* It was trained on 486GB of plain text from the [CulturaX](https://huggingface.co/datasets/uonlp/CulturaX) corpus. |
|
|
|
# Pretraining |
|
|
|
The [TensorFlow Model Garden LMs](https://github.com/stefan-it/model-garden-lms) repo was used to train an ELECTRA |
|
model using the very efficient [TEAMS](https://aclanthology.org/2021.findings-acl.219/) approach. |
|
|
|
As pretraining corpus, 486GB of plain text was extracted from the [CulturaX](https://huggingface.co/datasets/uonlp/CulturaX) corpus. |
|
|
|
GERTuraX-2 uses a 64k vocab corpus (cased) and was trained for 1M steps with a batch size of 1024 and a sequence length of 512 on a v3-32 TPU Pod. |
|
|
|
The pretraining took 5.4 days and the TensorBoard can be found [here](../../tensorboard). |
|
|
|
# Evaluation |
|
|
|
GERTuraX-2 was tested on GermEval 2014 (NER), GermEval 2018 (Sentiment analysis), CoNLL-2003 (NER) and on the ScandEval benchmark. |
|
|
|
We use the same hyper-parameters for GermEval 2014, GermEval 2018 and CoNLL-2003 as used in the [GeBERTa](https://arxiv.org/abs/2310.07321) paper (cf. Table 5) using 5 runs with different seed and report the averaged score. |
|
|
|
## GermEval 2014 |
|
|
|
### GermEval 2014 - Original version |
|
|
|
| Model Name | Avg. Development F1-Score | Avg. Test F1-Score | |
|
| ----------------------------------------------------------------------------------- | ------------------------- | ------------------ | |
|
| [GBERT Base](https://huggingface.co/deepset/gbert-base) | 87.53 ± 0.22 | 86.81 ± 0.16 | |
|
| [GERTuraX-1](https://huggingface.co/gerturax/gerturax-1) (147GB) | 88.32 ± 0.21 | 87.18 ± 0.12 | |
|
| [GERTuraX-2](https://huggingface.co/gerturax/gerturax-2) (486GB) | 88.58 ± 0.32 | 87.58 ± 0.15 | |
|
| [GERTuraX-3](https://huggingface.co/gerturax/gerturax-3) (1.1TB) | 88.90 ± 0.06 | 87.84 ± 0.18 | |
|
| [GeBERTa Base](https://huggingface.co/ikim-uk-essen/geberta-base) | 88.79 ± 0.16 | 88.03 ± 0.16 | |
|
|
|
### GermEval 2014 - [Without Wikipedia](https://huggingface.co/datasets/stefan-it/germeval14_no_wikipedia) |
|
|
|
| Model Name | Avg. Development F1-Score | Avg. Test F1-Score | |
|
| ----------------------------------------------------------------------------------- | ------------------------- | ------------------ | |
|
| [GBERT Base](https://huggingface.co/deepset/gbert-base) | 90.48 ± 0.34 | 89.05 ± 0.21 | |
|
| [GERTuraX-1](https://huggingface.co/gerturax/gerturax-1) (147GB) | 91.27 ± 0.11 | 89.73 ± 0.27 | |
|
| [GERTuraX-2](https://huggingface.co/gerturax/gerturax-2) (486GB) | 91.70 ± 0.28 | 89.98 ± 0.22 | |
|
| [GERTuraX-3](https://huggingface.co/gerturax/gerturax-3) (1.1TB) | 91.75 ± 0.17 | 90.24 ± 0.27 | |
|
| [GeBERTa Base](https://huggingface.co/ikim-uk-essen/geberta-base) | 91.74 ± 0.23 | 90.28 ± 0.21 | |
|
|
|
## GermEval 2018 |
|
|
|
### GermEval 2018 - Fine Grained |
|
|
|
| Model Name | Avg. Development F1-Score | Avg. Test F1-Score | |
|
| ----------------------------------------------------------------------------------- | ------------------------- | ------------------ | |
|
| [GBERT Base](https://huggingface.co/deepset/gbert-base) | 63.66 ± 4.08 | 51.86 ± 1.31 | |
|
| [GERTuraX-1](https://huggingface.co/gerturax/gerturax-1) (147GB) | 62.87 ± 1.95 | 50.61 ± 0.36 | |
|
| [GERTuraX-2](https://huggingface.co/gerturax/gerturax-2) (486GB) | 64.37 ± 1.31 | 51.02 ± 0.90 | |
|
| [GERTuraX-3](https://huggingface.co/gerturax/gerturax-3) (1.1TB) | 66.39 ± 0.85 | 49.94 ± 2.06 | |
|
| [GeBERTa Base](https://huggingface.co/ikim-uk-essen/geberta-base) | 65.81 ± 3.29 | 52.45 ± 0.57 | |
|
|
|
### GermEval 2018 - Coarse Grained |
|
|
|
| Model Name | Avg. Development F1-Score | Avg. Test F1-Score | |
|
| ----------------------------------------------------------------------------------- | ------------------------- | ------------------ | |
|
| [GBERT Base](https://huggingface.co/deepset/gbert-base) | 83.15 ± 1.83 | 76.39 ± 0.64 | |
|
| [GERTuraX-1](https://huggingface.co/gerturax/gerturax-1) (147GB) | 83.72 ± 0.68 | 77.11 ± 0.59 | |
|
| [GERTuraX-2](https://huggingface.co/gerturax/gerturax-2) (486GB) | 84.51 ± 0.88 | 78.07 ± 0.91 | |
|
| [GERTuraX-3](https://huggingface.co/gerturax/gerturax-3) (1.1TB) | 84.33 ± 1.48 | 78.44 ± 0.74 | |
|
| [GeBERTa Base](https://huggingface.co/ikim-uk-essen/geberta-base) | 83.54 ± 1.27 | 78.36 ± 0.79 | |
|
|
|
## CoNLL-2003 - German, Revised |
|
|
|
| Model Name | Avg. Development F1-Score | Avg. Test F1-Score | |
|
| ----------------------------------------------------------------------------------- | ------------------------- | ------------------ | |
|
| [GBERT Base](https://huggingface.co/deepset/gbert-base) | 92.15 ± 0.10 | 88.73 ± 0.21 | |
|
| [GERTuraX-1](https://huggingface.co/gerturax/gerturax-1) (147GB) | 92.32 ± 0.14 | 90.09 ± 0.12 | |
|
| [GERTuraX-2](https://huggingface.co/gerturax/gerturax-2) (486GB) | 92.75 ± 0.20 | 90.15 ± 0.14 | |
|
| [GERTuraX-3](https://huggingface.co/gerturax/gerturax-3) (1.1TB) | 92.77 ± 0.28 | 90.83 ± 0.16 | |
|
| [GeBERTa Base](https://huggingface.co/ikim-uk-essen/geberta-base) | 92.87 ± 0.21 | 90.94 ± 0.24 | |
|
|
|
## ScandEval |
|
|
|
We use v12.10.5 of [ScandEval](https://github.com/ScandEval/ScandEval) to evaluate on the following tasks: |
|
|
|
* SB10k |
|
* ScaLA-De |
|
* GermanQuAD |
|
|
|
The package can be installed via: |
|
|
|
```bash |
|
$ pip3 install "scandeval[all]==12.10.5" |
|
``` |
|
|
|
### Results |
|
|
|
#### SB10k |
|
|
|
Evaluations on the SB10k dataset can be started like: |
|
|
|
```bash |
|
$ scandeval --model "deepset/gbert-base" --task sentiment-classification --language de |
|
$ scandeval --model "ikim-uk-essen/geberta-base" --task sentiment-classification --language de |
|
$ scandeval --model "gerturax/gerturax-1" --task sentiment-classification --language de |
|
$ scandeval --model "gerturax/gerturax-2" --task sentiment-classification --language de |
|
$ scandeval --model "gerturax/gerturax-3" --task sentiment-classification --language de |
|
``` |
|
|
|
| Model Name | Matthew's CC | Macro F1-Score | |
|
| ----------------------------------------------------------------------------------- | ------------------------- | ------------------ | |
|
| [GBERT Base](https://huggingface.co/deepset/gbert-base) | 59.58 ± 1.80 | 72.98 ± 1.20 | |
|
| [GERTuraX-1](https://huggingface.co/gerturax/gerturax-1) (147GB) | 61.56 ± 2.58 | 74.18 ± 1.77 | |
|
| [GERTuraX-2](https://huggingface.co/gerturax/gerturax-2) (486GB) | 65.24 ± 1.77 | 76.55 ± 1.22 | |
|
| [GERTuraX-3](https://huggingface.co/gerturax/gerturax-3) (1.1TB) | 64.33 ± 2.17 | 75.99 ± 1.40 | |
|
| [GeBERTa Base](https://huggingface.co/ikim-uk-essen/geberta-base) | 59.52 ± 2.14 | 72.76 ± 1.50 | |
|
|
|
#### ScaLA-De |
|
|
|
Evaluations on the ScaLA-De dataset can be started like: |
|
|
|
```bash |
|
$ scandeval --model "deepset/gbert-base" --task linguistic-acceptability --language de |
|
$ scandeval --model "ikim-uk-essen/geberta-base" --task linguistic-acceptability --language de |
|
$ scandeval --model "gerturax/gerturax-1" --task linguistic-acceptability --language de |
|
$ scandeval --model "gerturax/gerturax-2" --task linguistic-acceptability --language de |
|
$ scandeval --model "gerturax/gerturax-3" --task linguistic-acceptability --language de |
|
``` |
|
|
|
| Model Name | Matthew's CC | Macro F1-Score | |
|
| ----------------------------------------------------------------------------------- | ------------------------- | ------------------ | |
|
| [GBERT Base](https://huggingface.co/deepset/gbert-base) | 52.23 ± 4.34 | 73.90 ± 2.68 | |
|
| [GERTuraX-1](https://huggingface.co/gerturax/gerturax-1) (147GB) | 74.55 ± 1.28 | 86.88 ± 0.75 | |
|
| [GERTuraX-2](https://huggingface.co/gerturax/gerturax-2) (486GB) | 75.83 ± 2.85 | 87.59 ± 1.57 | |
|
| [GERTuraX-3](https://huggingface.co/gerturax/gerturax-3) (1.1TB) | 78.24 ± 1.25 | 88.83 ± 0.63 | |
|
| [GeBERTa Base](https://huggingface.co/ikim-uk-essen/geberta-base) | 59.70 ± 11.64 | 78.44 ± 6.12 | |
|
|
|
#### GermanQuAD |
|
|
|
```bash |
|
$ scandeval --model "deepset/gbert-base" --task question-answering --language de |
|
$ scandeval --model "ikim-uk-essen/geberta-base" --task question-answering --language de |
|
$ scandeval --model "gerturax/gerturax-1" --task question-answering --language de |
|
$ scandeval --model "gerturax/gerturax-2" --task question-answering --language de |
|
$ scandeval --model "gerturax/gerturax-3" --task question-answering --language de |
|
``` |
|
|
|
| Model Name | Em | F1-Score | |
|
| ----------------------------------------------------------------------------------- | ------------------------- | ------------------ | |
|
| [GBERT Base](https://huggingface.co/deepset/gbert-base) | 12.62 ± 2.20 | 29.62 ± 3.86 | |
|
| [GERTuraX-1](https://huggingface.co/gerturax/gerturax-1) (147GB) | 27.24 ± 1.05 | 52.01 ± 1.10 | |
|
| [GERTuraX-2](https://huggingface.co/gerturax/gerturax-2) (486GB) | 29.54 ± 1.05 | 55.12 ± 0.92 | |
|
| [GERTuraX-3](https://huggingface.co/gerturax/gerturax-3) (1.1TB) | 28.49 ± 1.21 | 54.83 ± 1.26 | |
|
| [GeBERTa Base](https://huggingface.co/ikim-uk-essen/geberta-base) | 28.81 ± 1.77 | 53.27 ± 1.92 | |
|
|
|
# ❤️ Acknowledgements |
|
|
|
GERTuraX is the outcome of the last 12 months of working with TPUs from the awesome [TRC program](https://sites.research.google/trc/about/) |
|
and the [TensorFlow Model Garden](https://github.com/tensorflow/models) library. |
|
|
|
Many thanks for providing TPUs! |
|
|
|
Made from Bavarian Oberland with ❤️ and 🥨. |