gerturax-2 / README.md
stefan-it's picture
readme: update pretraining details
3f88316 verified
|
raw
history blame
11.6 kB
---
license: apache-2.0
datasets:
- uonlp/CulturaX
language:
- de
tags:
- german
- electra
- teams
- culturax
- gerturax-2
---
# 🇩🇪 GERTuraX-2
This repository hosts the GERTuraX-2 model:
* GERTuraX-2 is a pretrained German encoder-only model, based on ELECTRA and pretrained with the [TEAMS](https://aclanthology.org/2021.findings-acl.219/) approach.
* It was trained on 486GB of plain text from the [CulturaX](https://huggingface.co/datasets/uonlp/CulturaX) corpus.
# Pretraining
The [TensorFlow Model Garden LMs](https://github.com/stefan-it/model-garden-lms) repo was used to train an ELECTRA
model using the very efficient [TEAMS](https://aclanthology.org/2021.findings-acl.219/) approach.
As pretraining corpus, 486GB of plain text was extracted from the [CulturaX](https://huggingface.co/datasets/uonlp/CulturaX) corpus.
GERTuraX-2 uses a 64k vocab corpus (cased) and was trained for 1M steps with a batch size of 1024 and a sequence length of 512 on a v3-32 TPU Pod.
The pretraining took 5.4 days and the TensorBoard can be found [here](../../tensorboard).
# Evaluation
GERTuraX-2 was tested on GermEval 2014 (NER), GermEval 2018 (Sentiment analysis), CoNLL-2003 (NER) and on the ScandEval benchmark.
We use the same hyper-parameters for GermEval 2014, GermEval 2018 and CoNLL-2003 as used in the [GeBERTa](https://arxiv.org/abs/2310.07321) paper (cf. Table 5) using 5 runs with different seed and report the averaged score.
## GermEval 2014
### GermEval 2014 - Original version
| Model Name | Avg. Development F1-Score | Avg. Test F1-Score |
| ----------------------------------------------------------------------------------- | ------------------------- | ------------------ |
| [GBERT Base](https://huggingface.co/deepset/gbert-base) | 87.53 ± 0.22 | 86.81 ± 0.16 |
| [GERTuraX-1](https://huggingface.co/gerturax/gerturax-1) (147GB) | 88.32 ± 0.21 | 87.18 ± 0.12 |
| [GERTuraX-2](https://huggingface.co/gerturax/gerturax-2) (486GB) | 88.58 ± 0.32 | 87.58 ± 0.15 |
| [GERTuraX-3](https://huggingface.co/gerturax/gerturax-3) (1.1TB) | 88.90 ± 0.06 | 87.84 ± 0.18 |
| [GeBERTa Base](https://huggingface.co/ikim-uk-essen/geberta-base) | 88.79 ± 0.16 | 88.03 ± 0.16 |
### GermEval 2014 - [Without Wikipedia](https://huggingface.co/datasets/stefan-it/germeval14_no_wikipedia)
| Model Name | Avg. Development F1-Score | Avg. Test F1-Score |
| ----------------------------------------------------------------------------------- | ------------------------- | ------------------ |
| [GBERT Base](https://huggingface.co/deepset/gbert-base) | 90.48 ± 0.34 | 89.05 ± 0.21 |
| [GERTuraX-1](https://huggingface.co/gerturax/gerturax-1) (147GB) | 91.27 ± 0.11 | 89.73 ± 0.27 |
| [GERTuraX-2](https://huggingface.co/gerturax/gerturax-2) (486GB) | 91.70 ± 0.28 | 89.98 ± 0.22 |
| [GERTuraX-3](https://huggingface.co/gerturax/gerturax-3) (1.1TB) | 91.75 ± 0.17 | 90.24 ± 0.27 |
| [GeBERTa Base](https://huggingface.co/ikim-uk-essen/geberta-base) | 91.74 ± 0.23 | 90.28 ± 0.21 |
## GermEval 2018
### GermEval 2018 - Fine Grained
| Model Name | Avg. Development F1-Score | Avg. Test F1-Score |
| ----------------------------------------------------------------------------------- | ------------------------- | ------------------ |
| [GBERT Base](https://huggingface.co/deepset/gbert-base) | 63.66 ± 4.08 | 51.86 ± 1.31 |
| [GERTuraX-1](https://huggingface.co/gerturax/gerturax-1) (147GB) | 62.87 ± 1.95 | 50.61 ± 0.36 |
| [GERTuraX-2](https://huggingface.co/gerturax/gerturax-2) (486GB) | 64.37 ± 1.31 | 51.02 ± 0.90 |
| [GERTuraX-3](https://huggingface.co/gerturax/gerturax-3) (1.1TB) | 66.39 ± 0.85 | 49.94 ± 2.06 |
| [GeBERTa Base](https://huggingface.co/ikim-uk-essen/geberta-base) | 65.81 ± 3.29 | 52.45 ± 0.57 |
### GermEval 2018 - Coarse Grained
| Model Name | Avg. Development F1-Score | Avg. Test F1-Score |
| ----------------------------------------------------------------------------------- | ------------------------- | ------------------ |
| [GBERT Base](https://huggingface.co/deepset/gbert-base) | 83.15 ± 1.83 | 76.39 ± 0.64 |
| [GERTuraX-1](https://huggingface.co/gerturax/gerturax-1) (147GB) | 83.72 ± 0.68 | 77.11 ± 0.59 |
| [GERTuraX-2](https://huggingface.co/gerturax/gerturax-2) (486GB) | 84.51 ± 0.88 | 78.07 ± 0.91 |
| [GERTuraX-3](https://huggingface.co/gerturax/gerturax-3) (1.1TB) | 84.33 ± 1.48 | 78.44 ± 0.74 |
| [GeBERTa Base](https://huggingface.co/ikim-uk-essen/geberta-base) | 83.54 ± 1.27 | 78.36 ± 0.79 |
## CoNLL-2003 - German, Revised
| Model Name | Avg. Development F1-Score | Avg. Test F1-Score |
| ----------------------------------------------------------------------------------- | ------------------------- | ------------------ |
| [GBERT Base](https://huggingface.co/deepset/gbert-base) | 92.15 ± 0.10 | 88.73 ± 0.21 |
| [GERTuraX-1](https://huggingface.co/gerturax/gerturax-1) (147GB) | 92.32 ± 0.14 | 90.09 ± 0.12 |
| [GERTuraX-2](https://huggingface.co/gerturax/gerturax-2) (486GB) | 92.75 ± 0.20 | 90.15 ± 0.14 |
| [GERTuraX-3](https://huggingface.co/gerturax/gerturax-3) (1.1TB) | 92.77 ± 0.28 | 90.83 ± 0.16 |
| [GeBERTa Base](https://huggingface.co/ikim-uk-essen/geberta-base) | 92.87 ± 0.21 | 90.94 ± 0.24 |
## ScandEval
We use v12.10.5 of [ScandEval](https://github.com/ScandEval/ScandEval) to evaluate on the following tasks:
* SB10k
* ScaLA-De
* GermanQuAD
The package can be installed via:
```bash
$ pip3 install "scandeval[all]==12.10.5"
```
### Results
#### SB10k
Evaluations on the SB10k dataset can be started like:
```bash
$ scandeval --model "deepset/gbert-base" --task sentiment-classification --language de
$ scandeval --model "ikim-uk-essen/geberta-base" --task sentiment-classification --language de
$ scandeval --model "gerturax/gerturax-1" --task sentiment-classification --language de
$ scandeval --model "gerturax/gerturax-2" --task sentiment-classification --language de
$ scandeval --model "gerturax/gerturax-3" --task sentiment-classification --language de
```
| Model Name | Matthew's CC | Macro F1-Score |
| ----------------------------------------------------------------------------------- | ------------------------- | ------------------ |
| [GBERT Base](https://huggingface.co/deepset/gbert-base) | 59.58 ± 1.80 | 72.98 ± 1.20 |
| [GERTuraX-1](https://huggingface.co/gerturax/gerturax-1) (147GB) | 61.56 ± 2.58 | 74.18 ± 1.77 |
| [GERTuraX-2](https://huggingface.co/gerturax/gerturax-2) (486GB) | 65.24 ± 1.77 | 76.55 ± 1.22 |
| [GERTuraX-3](https://huggingface.co/gerturax/gerturax-3) (1.1TB) | 64.33 ± 2.17 | 75.99 ± 1.40 |
| [GeBERTa Base](https://huggingface.co/ikim-uk-essen/geberta-base) | 59.52 ± 2.14 | 72.76 ± 1.50 |
#### ScaLA-De
Evaluations on the ScaLA-De dataset can be started like:
```bash
$ scandeval --model "deepset/gbert-base" --task linguistic-acceptability --language de
$ scandeval --model "ikim-uk-essen/geberta-base" --task linguistic-acceptability --language de
$ scandeval --model "gerturax/gerturax-1" --task linguistic-acceptability --language de
$ scandeval --model "gerturax/gerturax-2" --task linguistic-acceptability --language de
$ scandeval --model "gerturax/gerturax-3" --task linguistic-acceptability --language de
```
| Model Name | Matthew's CC | Macro F1-Score |
| ----------------------------------------------------------------------------------- | ------------------------- | ------------------ |
| [GBERT Base](https://huggingface.co/deepset/gbert-base) | 52.23 ± 4.34 | 73.90 ± 2.68 |
| [GERTuraX-1](https://huggingface.co/gerturax/gerturax-1) (147GB) | 74.55 ± 1.28 | 86.88 ± 0.75 |
| [GERTuraX-2](https://huggingface.co/gerturax/gerturax-2) (486GB) | 75.83 ± 2.85 | 87.59 ± 1.57 |
| [GERTuraX-3](https://huggingface.co/gerturax/gerturax-3) (1.1TB) | 78.24 ± 1.25 | 88.83 ± 0.63 |
| [GeBERTa Base](https://huggingface.co/ikim-uk-essen/geberta-base) | 59.70 ± 11.64 | 78.44 ± 6.12 |
#### GermanQuAD
```bash
$ scandeval --model "deepset/gbert-base" --task question-answering --language de
$ scandeval --model "ikim-uk-essen/geberta-base" --task question-answering --language de
$ scandeval --model "gerturax/gerturax-1" --task question-answering --language de
$ scandeval --model "gerturax/gerturax-2" --task question-answering --language de
$ scandeval --model "gerturax/gerturax-3" --task question-answering --language de
```
| Model Name | Em | F1-Score |
| ----------------------------------------------------------------------------------- | ------------------------- | ------------------ |
| [GBERT Base](https://huggingface.co/deepset/gbert-base) | 12.62 ± 2.20 | 29.62 ± 3.86 |
| [GERTuraX-1](https://huggingface.co/gerturax/gerturax-1) (147GB) | 27.24 ± 1.05 | 52.01 ± 1.10 |
| [GERTuraX-2](https://huggingface.co/gerturax/gerturax-2) (486GB) | 29.54 ± 1.05 | 55.12 ± 0.92 |
| [GERTuraX-3](https://huggingface.co/gerturax/gerturax-3) (1.1TB) | 28.49 ± 1.21 | 54.83 ± 1.26 |
| [GeBERTa Base](https://huggingface.co/ikim-uk-essen/geberta-base) | 28.81 ± 1.77 | 53.27 ± 1.92 |
# ❤️ Acknowledgements
GERTuraX is the outcome of the last 12 months of working with TPUs from the awesome [TRC program](https://sites.research.google/trc/about/)
and the [TensorFlow Model Garden](https://github.com/tensorflow/models) library.
Many thanks for providing TPUs!
Made from Bavarian Oberland with ❤️ and 🥨.