chkla's picture
Update README.md
af24a70
|
raw
history blame
3.39 kB
---
language: de
widget:
- text: Das Sachgebiet Investive Ausgaben des Bundes Bundesfinanzminister Apel hat gemäß BMF Finanznachrichten vom 1. Januar erklärt, die Investitionsquote des Bundes sei in den letzten zehn Jahren nahezu konstant geblieben.
---
### Welcome to ParlBERT-Topic-German!
🏷 **Model description**
This model was trained on \~10k manually annotated interpellations (📚 [Breunig/ Schnatterer 2019](https://oxford.universitypressscholarship.com/view/10.1093/oso/9780198835332.001.0001/oso-9780198835332)) with topics from the [Comparative Agendas Project](https://www.comparativeagendas.net/datasets_codebooks) to classify text into one of twenty labels (annotation codebook).
_Note: "Interpellation is a formal request of a parliament to the respective government."([Wikipedia](https://en.wikipedia.org/wiki/Interpellation_(politics)))_
🗃 **Dataset**
| party | speeches | tokens |
|----|----|----|
| CDU/CSU | 7,635 | 4,862,654 |
| SPD | 5,321 | 3,158,315 |
| AfD | 3,465 | 1,844,707 |
| FDP | 3,067 | 1,593,108 |
| The Greens | 2,866 | 1,522,305 |
| The Left | 2,671 | 1,394,089 |
| cross-bencher | 200 | 86,170 |
🏃🏼‍♂️**Model training**
**ParlBERT-Topic-German** was fine-tuned on a domain adapted model (GermanBERT fine-tuned on [DeuParl](https://tudatalib.ulb.tu-darmstadt.de/handle/tudatalib/2889?show=full)) for topic modeling with an interpellations dataset (📚 [Breunig/ Schnatterer 2019](https://oxford.universitypressscholarship.com/view/10.1093/oso/9780198835332.001.0001/oso-9780198835332)) from the [Comparative Agendas Project](https://www.comparativeagendas.net/datasets_codebooks).
🤖 **Use**
```python
from transformers import pipeline
pipeline_classification_topics = pipeline("text-classification", model="chkla/parlbert-topic-german", return_all_scores=False)
text = "Das Sachgebiet Investive Ausgaben des Bundes Bundesfinanzminister Apel hat gemäß BMF Finanznachrichten vom 1. Januar erklärt, die Investitionsquote des Bundes sei in den letzten zehn Jahren nahezu konstant geblieben."
pipeline_classification_topics(text) # Macroeconomics
```
📊 **Evaluation**
The model was evaluated on an evaluation set (20%):
| Label | F1 | support |
|----|----|----|
| International | 80.0 | 1,126 |
| Defense | 85.0 | 1,099 |
| Government | 71.3 | 989 |
| Civil Rights | 76.5 | 978 |
| Environment | 76.6 | 845 |
| Transportation | 86.0 | 800 |
| Law & Crime | 67.1 | 492 |
| Energy | 78.6 | 424 |
| Health | 78.2 | 418 |
| Domestic Com. | 64.4 | 382 |
| Immigration | 81.0 | 376 |
| Labor | 69.1 | 344 |
| Macroeconom. | 62.8 | 339 |
| Agriculture | 76.3 | 292 |
| Social Welfare | 49.2 | 253 |
| Technology | 63.0 | 252 |
| Education | 71.6 | 183 |
| Housing | 79.6 | 178 |
| Foreign Trade | 61.5 | 139 |
| Culture | 54.6 | 69 |
| Public Lands | 45.4 | 55 |
⚠️ **Limitations**
Models are often highly topic dependent. Therefore, the model may perform less well on different topics and text types not included in the training set.
👥 **Cite**
```
@article{klamm2022frameast,
title={FrameASt: A Framework for Second-level Agenda Setting in Parliamentary Debates through the Lense of Comparative Agenda Topics},
author={Klamm, Christopher and Rehbein, Ines and Ponzetto, Simone},
journal={ParlaCLARIN III at LREC2022},
year={2022}
}
```
🐦 Twitter: [@chklamm](http://twitter.com/chklamm)