Update README.md

af24a70 over 1 year ago

3.39 kB

	---
	language: de
	widget:
	- text: Das Sachgebiet Investive Ausgaben des Bundes Bundesfinanzminister Apel hat gemäß BMF Finanznachrichten vom 1. Januar erklärt, die Investitionsquote des Bundes sei in den letzten zehn Jahren nahezu konstant geblieben.
	---

	### Welcome to ParlBERT-Topic-German!

	🏷 Model description

	This model was trained on \~10k manually annotated interpellations (📚 [Breunig/ Schnatterer 2019](https://oxford.universitypressscholarship.com/view/10.1093/oso/9780198835332.001.0001/oso-9780198835332)) with topics from the [Comparative Agendas Project](https://www.comparativeagendas.net/datasets_codebooks) to classify text into one of twenty labels (annotation codebook).

	_Note: "Interpellation is a formal request of a parliament to the respective government."([Wikipedia](https://en.wikipedia.org/wiki/Interpellation_(politics)))_

	🗃 Dataset

	\| party \| speeches \| tokens \|
	\|----\|----\|----\|
	\| CDU/CSU \| 7,635 \| 4,862,654 \|
	\| SPD \| 5,321 \| 3,158,315 \|
	\| AfD \| 3,465 \| 1,844,707 \|
	\| FDP \| 3,067 \| 1,593,108 \|
	\| The Greens \| 2,866 \| 1,522,305 \|
	\| The Left \| 2,671 \| 1,394,089 \|
	\| cross-bencher \| 200 \| 86,170 \|

	🏃🏼‍♂️Model training

	ParlBERT-Topic-German was fine-tuned on a domain adapted model (GermanBERT fine-tuned on [DeuParl](https://tudatalib.ulb.tu-darmstadt.de/handle/tudatalib/2889?show=full)) for topic modeling with an interpellations dataset (📚 [Breunig/ Schnatterer 2019](https://oxford.universitypressscholarship.com/view/10.1093/oso/9780198835332.001.0001/oso-9780198835332)) from the [Comparative Agendas Project](https://www.comparativeagendas.net/datasets_codebooks).

	🤖 Use

	```python
	from transformers import pipeline

	pipeline_classification_topics = pipeline("text-classification", model="chkla/parlbert-topic-german", return_all_scores=False)
	text = "Das Sachgebiet Investive Ausgaben des Bundes Bundesfinanzminister Apel hat gemäß BMF Finanznachrichten vom 1. Januar erklärt, die Investitionsquote des Bundes sei in den letzten zehn Jahren nahezu konstant geblieben."
	pipeline_classification_topics(text) # Macroeconomics

	```


	📊 Evaluation

	The model was evaluated on an evaluation set (20%):

	\| Label \| F1 \| support \|
	\|----\|----\|----\|
	\| International \| 80.0 \| 1,126 \|
	\| Defense \| 85.0 \| 1,099 \|
	\| Government \| 71.3 \| 989 \|
	\| Civil Rights \| 76.5 \| 978 \|
	\| Environment \| 76.6 \| 845 \|
	\| Transportation \| 86.0 \| 800 \|
	\| Law & Crime \| 67.1 \| 492 \|
	\| Energy \| 78.6 \| 424 \|
	\| Health \| 78.2 \| 418 \|
	\| Domestic Com. \| 64.4 \| 382 \|
	\| Immigration \| 81.0 \| 376 \|
	\| Labor \| 69.1 \| 344 \|
	\| Macroeconom. \| 62.8 \| 339 \|
	\| Agriculture \| 76.3 \| 292 \|
	\| Social Welfare \| 49.2 \| 253 \|
	\| Technology \| 63.0 \| 252 \|
	\| Education \| 71.6 \| 183 \|
	\| Housing \| 79.6 \| 178 \|
	\| Foreign Trade \| 61.5 \| 139 \|
	\| Culture \| 54.6 \| 69 \|
	\| Public Lands \| 45.4 \| 55 \|


	⚠️ Limitations

	Models are often highly topic dependent. Therefore, the model may perform less well on different topics and text types not included in the training set.

	👥 Cite
	```
	@article{klamm2022frameast,
	title={FrameASt: A Framework for Second-level Agenda Setting in Parliamentary Debates through the Lense of Comparative Agenda Topics},
	author={Klamm, Christopher and Rehbein, Ines and Ponzetto, Simone},
	journal={ParlaCLARIN III at LREC2022},
	year={2022}
	}
	```

	🐦 Twitter: [@chklamm](http://twitter.com/chklamm)

	---
	language: de
	widget:
	- text: Das Sachgebiet Investive Ausgaben des Bundes Bundesfinanzminister Apel hat gemäß BMF Finanznachrichten vom 1. Januar erklärt, die Investitionsquote des Bundes sei in den letzten zehn Jahren nahezu konstant geblieben.
	---

	### Welcome to ParlBERT-Topic-German!

	🏷 Model description

	This model was trained on \~10k manually annotated interpellations (📚 [Breunig/ Schnatterer 2019](https://oxford.universitypressscholarship.com/view/10.1093/oso/9780198835332.001.0001/oso-9780198835332)) with topics from the [Comparative Agendas Project](https://www.comparativeagendas.net/datasets_codebooks) to classify text into one of twenty labels (annotation codebook).

	_Note: "Interpellation is a formal request of a parliament to the respective government."([Wikipedia](https://en.wikipedia.org/wiki/Interpellation_(politics)))_

	🗃 Dataset

	\| party \| speeches \| tokens \|
	\|----\|----\|----\|
	\| CDU/CSU \| 7,635 \| 4,862,654 \|
	\| SPD \| 5,321 \| 3,158,315 \|
	\| AfD \| 3,465 \| 1,844,707 \|
	\| FDP \| 3,067 \| 1,593,108 \|
	\| The Greens \| 2,866 \| 1,522,305 \|
	\| The Left \| 2,671 \| 1,394,089 \|
	\| cross-bencher \| 200 \| 86,170 \|

	🏃🏼‍♂️Model training

	ParlBERT-Topic-German was fine-tuned on a domain adapted model (GermanBERT fine-tuned on [DeuParl](https://tudatalib.ulb.tu-darmstadt.de/handle/tudatalib/2889?show=full)) for topic modeling with an interpellations dataset (📚 [Breunig/ Schnatterer 2019](https://oxford.universitypressscholarship.com/view/10.1093/oso/9780198835332.001.0001/oso-9780198835332)) from the [Comparative Agendas Project](https://www.comparativeagendas.net/datasets_codebooks).

	🤖 Use

	```python
	from transformers import pipeline

	pipeline_classification_topics = pipeline("text-classification", model="chkla/parlbert-topic-german", return_all_scores=False)
	text = "Das Sachgebiet Investive Ausgaben des Bundes Bundesfinanzminister Apel hat gemäß BMF Finanznachrichten vom 1. Januar erklärt, die Investitionsquote des Bundes sei in den letzten zehn Jahren nahezu konstant geblieben."
	pipeline_classification_topics(text) # Macroeconomics

	```


	📊 Evaluation

	The model was evaluated on an evaluation set (20%):

	\| Label \| F1 \| support \|
	\|----\|----\|----\|
	\| International \| 80.0 \| 1,126 \|
	\| Defense \| 85.0 \| 1,099 \|
	\| Government \| 71.3 \| 989 \|
	\| Civil Rights \| 76.5 \| 978 \|
	\| Environment \| 76.6 \| 845 \|
	\| Transportation \| 86.0 \| 800 \|
	\| Law & Crime \| 67.1 \| 492 \|
	\| Energy \| 78.6 \| 424 \|
	\| Health \| 78.2 \| 418 \|
	\| Domestic Com. \| 64.4 \| 382 \|
	\| Immigration \| 81.0 \| 376 \|
	\| Labor \| 69.1 \| 344 \|
	\| Macroeconom. \| 62.8 \| 339 \|
	\| Agriculture \| 76.3 \| 292 \|
	\| Social Welfare \| 49.2 \| 253 \|
	\| Technology \| 63.0 \| 252 \|
	\| Education \| 71.6 \| 183 \|
	\| Housing \| 79.6 \| 178 \|
	\| Foreign Trade \| 61.5 \| 139 \|
	\| Culture \| 54.6 \| 69 \|
	\| Public Lands \| 45.4 \| 55 \|


	⚠️ Limitations

	Models are often highly topic dependent. Therefore, the model may perform less well on different topics and text types not included in the training set.

	👥 Cite
	```
	@article{klamm2022frameast,
	title={FrameASt: A Framework for Second-level Agenda Setting in Parliamentary Debates through the Lense of Comparative Agenda Topics},
	author={Klamm, Christopher and Rehbein, Ines and Ponzetto, Simone},
	journal={ParlaCLARIN III at LREC2022},
	year={2022}
	}
	```

	🐦 Twitter: [@chklamm](http://twitter.com/chklamm)