rabuahmad
/

tweet-style-classifier-de

Text Classification

Model card Files Files and versions Community

tweet-style-classifier-de / README.md

rabuahmad's picture

Update README.md

1d80146 verified 3 months ago

|

history blame contribute delete

1.64 kB

	---
	license: apache-2.0
	datasets:
	- Alienmaster/SB10k
	- cardiffnlp/tweet_sentiment_multilingual
	- legacy-datasets/wikipedia
	- community-datasets/gnad10
	language:
	- de
	base_model: dbmdz/bert-base-german-uncased
	pipeline_tag: text-classification
	---

	## Tweet Style Classifier (German)


	This model is a fine-tuned bert-base-uncased on a binary classification task to determine whether a German text is a tweet or not.

	The dataset contained about 20K instances, with a 50/50 distribution between the two classes. It was shuffled with a random seed of 42 and split into 80/20 for training/testing.
	The NVIDIA RTX A6000 GPU was used for training three epochs with a batch size of 8. Other hyperparameters were default values from the HuggingFace Trainer.

	The model was trained in order to evaluate a text style transfer task, converting formal-language texts to tweets.

	### How to use

	```python
	from transformers import AutoModelForSequenceClassification, AutoTokenizer, TextClassificationPipeline

	model_name = "rabuahmad/tweet-style-classifier-de"

	model = AutoModelForSequenceClassification.from_pretrained(model_name)
	tokenizer = AutoTokenizer.from_pretrained(model_name, max_len=512)

	classifier = TextClassificationPipeline(model=model, tokenizer=tokenizer, truncation=True, max_length=512)

	text = "Gestern war ein schöner Tag!"

	result = classifier(text)

	```
	Label 1 indicates that the text is predicted to be a tweet.

	### Evaluation

	Evaluation results on the test set:

	\| Metric \|Score \|
	\|----------\|-----------\|
	\| Accuracy \| 0.99988 \|
	\| Precision\| 0.99901 \|
	\| Recall \| 0.99901 \|
	\| F1 \| 0.99901 \|