File size: 11,587 Bytes
b500c53
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
fcd5f45
b500c53
 
 
 
 
 
fcd5f45
b500c53
3f88316
 
 
b500c53
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
---
license: apache-2.0
datasets:
- uonlp/CulturaX
language:
- de
tags:
- german
- electra
- teams
- culturax
- gerturax-2
---

# 🇩🇪 GERTuraX-2

This repository hosts the GERTuraX-2 model:

* GERTuraX-2 is a pretrained German encoder-only model, based on ELECTRA and pretrained with the [TEAMS](https://aclanthology.org/2021.findings-acl.219/) approach.
* It was trained on 486GB of plain text from the [CulturaX](https://huggingface.co/datasets/uonlp/CulturaX) corpus.

# Pretraining

The [TensorFlow Model Garden LMs](https://github.com/stefan-it/model-garden-lms) repo was used to train an ELECTRA
model using the very efficient [TEAMS](https://aclanthology.org/2021.findings-acl.219/) approach.

As pretraining corpus, 486GB of plain text was extracted from the [CulturaX](https://huggingface.co/datasets/uonlp/CulturaX) corpus.

GERTuraX-2 uses a 64k vocab corpus (cased) and was trained for 1M steps with a batch size of 1024 and a sequence length of 512 on a v3-32 TPU Pod.

The pretraining took 5.4 days and the TensorBoard can be found [here](../../tensorboard).

# Evaluation

GERTuraX-2 was tested on GermEval 2014 (NER), GermEval 2018 (Sentiment analysis), CoNLL-2003 (NER) and on the ScandEval benchmark.

We use the same hyper-parameters for GermEval 2014, GermEval 2018 and CoNLL-2003 as used in the [GeBERTa](https://arxiv.org/abs/2310.07321) paper (cf. Table 5) using 5 runs with different seed and report the averaged score.

## GermEval 2014

### GermEval 2014 - Original version

| Model Name                                                                          | Avg. Development F1-Score | Avg. Test F1-Score |
| ----------------------------------------------------------------------------------- | ------------------------- | ------------------ |
| [GBERT Base](https://huggingface.co/deepset/gbert-base)                             | 87.53 ± 0.22              | 86.81 ± 0.16       |
| [GERTuraX-1](https://huggingface.co/gerturax/gerturax-1) (147GB)                    | 88.32 ± 0.21              | 87.18 ± 0.12       |
| [GERTuraX-2](https://huggingface.co/gerturax/gerturax-2) (486GB)                    | 88.58 ± 0.32              | 87.58 ± 0.15       |
| [GERTuraX-3](https://huggingface.co/gerturax/gerturax-3) (1.1TB)                    | 88.90 ± 0.06              | 87.84 ± 0.18       |
| [GeBERTa Base](https://huggingface.co/ikim-uk-essen/geberta-base)                   | 88.79 ± 0.16              | 88.03 ± 0.16       |

### GermEval 2014 - [Without Wikipedia](https://huggingface.co/datasets/stefan-it/germeval14_no_wikipedia)

| Model Name                                                                          | Avg. Development F1-Score | Avg. Test F1-Score |
| ----------------------------------------------------------------------------------- | ------------------------- | ------------------ |
| [GBERT Base](https://huggingface.co/deepset/gbert-base)                             | 90.48 ± 0.34              | 89.05 ± 0.21       |
| [GERTuraX-1](https://huggingface.co/gerturax/gerturax-1) (147GB)                    | 91.27 ± 0.11              | 89.73 ± 0.27       |
| [GERTuraX-2](https://huggingface.co/gerturax/gerturax-2) (486GB)                    | 91.70 ± 0.28              | 89.98 ± 0.22       |
| [GERTuraX-3](https://huggingface.co/gerturax/gerturax-3) (1.1TB)                    | 91.75 ± 0.17              | 90.24 ± 0.27       |
| [GeBERTa Base](https://huggingface.co/ikim-uk-essen/geberta-base)                   | 91.74 ± 0.23              | 90.28 ± 0.21       |

## GermEval 2018

### GermEval 2018 - Fine Grained

| Model Name                                                                          | Avg. Development F1-Score | Avg. Test F1-Score |
| ----------------------------------------------------------------------------------- | ------------------------- | ------------------ |
| [GBERT Base](https://huggingface.co/deepset/gbert-base)                             | 63.66 ± 4.08              | 51.86 ± 1.31       |
| [GERTuraX-1](https://huggingface.co/gerturax/gerturax-1) (147GB)                    | 62.87 ± 1.95              | 50.61 ± 0.36       |
| [GERTuraX-2](https://huggingface.co/gerturax/gerturax-2) (486GB)                    | 64.37 ± 1.31              | 51.02 ± 0.90       |
| [GERTuraX-3](https://huggingface.co/gerturax/gerturax-3) (1.1TB)                    | 66.39 ± 0.85              | 49.94 ± 2.06       |
| [GeBERTa Base](https://huggingface.co/ikim-uk-essen/geberta-base)                   | 65.81 ± 3.29              | 52.45 ± 0.57       |

### GermEval 2018 - Coarse Grained

| Model Name                                                                          | Avg. Development F1-Score | Avg. Test F1-Score |
| ----------------------------------------------------------------------------------- | ------------------------- | ------------------ |
| [GBERT Base](https://huggingface.co/deepset/gbert-base)                             | 83.15 ± 1.83              | 76.39 ± 0.64       |
| [GERTuraX-1](https://huggingface.co/gerturax/gerturax-1) (147GB)                    | 83.72 ± 0.68              | 77.11 ± 0.59       |
| [GERTuraX-2](https://huggingface.co/gerturax/gerturax-2) (486GB)                    | 84.51 ± 0.88              | 78.07 ± 0.91       |
| [GERTuraX-3](https://huggingface.co/gerturax/gerturax-3) (1.1TB)                    | 84.33 ± 1.48              | 78.44 ± 0.74       |
| [GeBERTa Base](https://huggingface.co/ikim-uk-essen/geberta-base)                   | 83.54 ± 1.27              | 78.36 ± 0.79       |

## CoNLL-2003 - German, Revised

| Model Name                                                                          | Avg. Development F1-Score | Avg. Test F1-Score |
| ----------------------------------------------------------------------------------- | ------------------------- | ------------------ |
| [GBERT Base](https://huggingface.co/deepset/gbert-base)                             | 92.15 ± 0.10              | 88.73 ± 0.21       |
| [GERTuraX-1](https://huggingface.co/gerturax/gerturax-1) (147GB)                    | 92.32 ± 0.14              | 90.09 ± 0.12       |
| [GERTuraX-2](https://huggingface.co/gerturax/gerturax-2) (486GB)                    | 92.75 ± 0.20              | 90.15 ± 0.14       |
| [GERTuraX-3](https://huggingface.co/gerturax/gerturax-3) (1.1TB)                    | 92.77 ± 0.28              | 90.83 ± 0.16       |
| [GeBERTa Base](https://huggingface.co/ikim-uk-essen/geberta-base)                   | 92.87 ± 0.21              | 90.94 ± 0.24       |

## ScandEval

We use v12.10.5 of [ScandEval](https://github.com/ScandEval/ScandEval) to evaluate on the following tasks:

* SB10k
* ScaLA-De
* GermanQuAD

The package can be installed via:

```bash
$ pip3 install "scandeval[all]==12.10.5"
```

### Results

#### SB10k

Evaluations on the SB10k dataset can be started like:

```bash
$ scandeval --model "deepset/gbert-base" --task sentiment-classification --language de
$ scandeval --model "ikim-uk-essen/geberta-base" --task sentiment-classification --language de
$ scandeval --model "gerturax/gerturax-1" --task sentiment-classification --language de
$ scandeval --model "gerturax/gerturax-2" --task sentiment-classification --language de
$ scandeval --model "gerturax/gerturax-3" --task sentiment-classification --language de
```

| Model Name                                                                          | Matthew's CC              | Macro F1-Score     |
| ----------------------------------------------------------------------------------- | ------------------------- | ------------------ |
| [GBERT Base](https://huggingface.co/deepset/gbert-base)                             | 59.58 ± 1.80              | 72.98 ± 1.20       |
| [GERTuraX-1](https://huggingface.co/gerturax/gerturax-1) (147GB)                    | 61.56 ± 2.58              | 74.18 ± 1.77       |
| [GERTuraX-2](https://huggingface.co/gerturax/gerturax-2) (486GB)                    | 65.24 ± 1.77              | 76.55 ± 1.22       |
| [GERTuraX-3](https://huggingface.co/gerturax/gerturax-3) (1.1TB)                    | 64.33 ± 2.17              | 75.99 ± 1.40       |
| [GeBERTa Base](https://huggingface.co/ikim-uk-essen/geberta-base)                   | 59.52 ± 2.14              | 72.76 ± 1.50       |

#### ScaLA-De

Evaluations on the ScaLA-De dataset can be started like:

```bash
$ scandeval --model "deepset/gbert-base" --task linguistic-acceptability --language de
$ scandeval --model "ikim-uk-essen/geberta-base" --task linguistic-acceptability --language de
$ scandeval --model "gerturax/gerturax-1" --task linguistic-acceptability --language de
$ scandeval --model "gerturax/gerturax-2" --task linguistic-acceptability --language de
$ scandeval --model "gerturax/gerturax-3" --task linguistic-acceptability --language de
```

| Model Name                                                                          | Matthew's CC              | Macro F1-Score     |
| ----------------------------------------------------------------------------------- | ------------------------- | ------------------ |
| [GBERT Base](https://huggingface.co/deepset/gbert-base)                             | 52.23 ± 4.34              | 73.90 ± 2.68       |
| [GERTuraX-1](https://huggingface.co/gerturax/gerturax-1) (147GB)                    | 74.55 ± 1.28              | 86.88 ± 0.75       |
| [GERTuraX-2](https://huggingface.co/gerturax/gerturax-2) (486GB)                    | 75.83 ± 2.85              | 87.59 ± 1.57       |
| [GERTuraX-3](https://huggingface.co/gerturax/gerturax-3) (1.1TB)                    | 78.24 ± 1.25              | 88.83 ± 0.63       |
| [GeBERTa Base](https://huggingface.co/ikim-uk-essen/geberta-base)                   | 59.70 ± 11.64             | 78.44 ± 6.12       |

#### GermanQuAD

```bash
$ scandeval --model "deepset/gbert-base" --task question-answering --language de
$ scandeval --model "ikim-uk-essen/geberta-base" --task question-answering --language de
$ scandeval --model "gerturax/gerturax-1" --task question-answering --language de
$ scandeval --model "gerturax/gerturax-2" --task question-answering --language de
$ scandeval --model "gerturax/gerturax-3" --task question-answering --language de
```

| Model Name                                                                          | Em                        | F1-Score           |
| ----------------------------------------------------------------------------------- | ------------------------- | ------------------ |
| [GBERT Base](https://huggingface.co/deepset/gbert-base)                             | 12.62 ± 2.20              | 29.62 ± 3.86       |
| [GERTuraX-1](https://huggingface.co/gerturax/gerturax-1) (147GB)                    | 27.24 ± 1.05              | 52.01 ± 1.10       |
| [GERTuraX-2](https://huggingface.co/gerturax/gerturax-2) (486GB)                    | 29.54 ± 1.05              | 55.12 ± 0.92       |
| [GERTuraX-3](https://huggingface.co/gerturax/gerturax-3) (1.1TB)                    | 28.49 ± 1.21              | 54.83 ± 1.26       |
| [GeBERTa Base](https://huggingface.co/ikim-uk-essen/geberta-base)                   | 28.81 ± 1.77              | 53.27 ± 1.92       |

# ❤️ Acknowledgements

GERTuraX is the outcome of the last 12 months of working with TPUs from the awesome [TRC program](https://sites.research.google/trc/about/)
and the [TensorFlow Model Garden](https://github.com/tensorflow/models) library.

Many thanks for providing TPUs!

Made from Bavarian Oberland with ❤️ and 🥨.