|
--- |
|
language: de |
|
widget: |
|
- text: Das Sachgebiet Investive Ausgaben des Bundes Bundesfinanzminister Apel hat gemäß BMF Finanznachrichten vom 1. Januar erklärt, die Investitionsquote des Bundes sei in den letzten zehn Jahren nahezu konstant geblieben. |
|
--- |
|
|
|
### Welcome to ParlBERT-Topic-German! |
|
|
|
🏷 **Model description** |
|
|
|
This model was trained on \~10k manually annotated interpellations (📚 [Breunig/ Schnatterer 2019](https://oxford.universitypressscholarship.com/view/10.1093/oso/9780198835332.001.0001/oso-9780198835332)) with topics from the [Comparative Agendas Project](https://www.comparativeagendas.net/datasets_codebooks) to classify text into one of twenty labels (annotation codebook). |
|
|
|
_Note: "Interpellation is a formal request of a parliament to the respective government."([Wikipedia](https://en.wikipedia.org/wiki/Interpellation_(politics)))_ |
|
|
|
🗃 **Dataset** |
|
|
|
| party | speeches | tokens | |
|
|----|----|----| |
|
| CDU/CSU | 7,635 | 4,862,654 | |
|
| SPD | 5,321 | 3,158,315 | |
|
| AfD | 3,465 | 1,844,707 | |
|
| FDP | 3,067 | 1,593,108 | |
|
| The Greens | 2,866 | 1,522,305 | |
|
| The Left | 2,671 | 1,394,089 | |
|
| cross-bencher | 200 | 86,170 | |
|
|
|
🏃🏼♂️**Model training** |
|
|
|
**ParlBERT-Topic-German** was fine-tuned on a domain adapted model (GermanBERT fine-tuned on [DeuParl](https://tudatalib.ulb.tu-darmstadt.de/handle/tudatalib/2889?show=full)) for topic modeling with an interpellations dataset (📚 [Breunig/ Schnatterer 2019](https://oxford.universitypressscholarship.com/view/10.1093/oso/9780198835332.001.0001/oso-9780198835332)) from the [Comparative Agendas Project](https://www.comparativeagendas.net/datasets_codebooks). |
|
|
|
🤖 **Use** |
|
|
|
```python |
|
from transformers import pipeline |
|
|
|
pipeline_classification_topics = pipeline("text-classification", model="chkla/parlbert-topic-german", return_all_scores=False) |
|
text = "Das Sachgebiet Investive Ausgaben des Bundes Bundesfinanzminister Apel hat gemäß BMF Finanznachrichten vom 1. Januar erklärt, die Investitionsquote des Bundes sei in den letzten zehn Jahren nahezu konstant geblieben." |
|
pipeline_classification_topics(text) # Macroeconomics |
|
|
|
``` |
|
|
|
|
|
📊 **Evaluation** |
|
|
|
The model was evaluated on an evaluation set (20%): |
|
|
|
| Label | F1 | support | |
|
|----|----|----| |
|
| International | 80.0 | 1,126 | |
|
| Defense | 85.0 | 1,099 | |
|
| Government | 71.3 | 989 | |
|
| Civil Rights | 76.5 | 978 | |
|
| Environment | 76.6 | 845 | |
|
| Transportation | 86.0 | 800 | |
|
| Law & Crime | 67.1 | 492 | |
|
| Energy | 78.6 | 424 | |
|
| Health | 78.2 | 418 | |
|
| Domestic Com. | 64.4 | 382 | |
|
| Immigration | 81.0 | 376 | |
|
| Labor | 69.1 | 344 | |
|
| Macroeconom. | 62.8 | 339 | |
|
| Agriculture | 76.3 | 292 | |
|
| Social Welfare | 49.2 | 253 | |
|
| Technology | 63.0 | 252 | |
|
| Education | 71.6 | 183 | |
|
| Housing | 79.6 | 178 | |
|
| Foreign Trade | 61.5 | 139 | |
|
| Culture | 54.6 | 69 | |
|
| Public Lands | 45.4 | 55 | |
|
|
|
|
|
⚠️ **Limitations** |
|
|
|
Models are often highly topic dependent. Therefore, the model may perform less well on different topics and text types not included in the training set. |
|
|
|
👥 **Cite** |
|
``` |
|
@article{klamm2022frameast, |
|
title={FrameASt: A Framework for Second-level Agenda Setting in Parliamentary Debates through the Lense of Comparative Agenda Topics}, |
|
author={Klamm, Christopher and Rehbein, Ines and Ponzetto, Simone}, |
|
journal={ParlaCLARIN III at LREC2022}, |
|
year={2022} |
|
} |
|
``` |
|
|
|
🐦 Twitter: [@chklamm](http://twitter.com/chklamm) |