rabuahmad's picture
Update README.md
1d80146 verified
|
raw
history blame
1.64 kB
---
license: apache-2.0
datasets:
- Alienmaster/SB10k
- cardiffnlp/tweet_sentiment_multilingual
- legacy-datasets/wikipedia
- community-datasets/gnad10
language:
- de
base_model: dbmdz/bert-base-german-uncased
pipeline_tag: text-classification
---
## Tweet Style Classifier (German)
This model is a fine-tuned bert-base-uncased on a binary classification task to determine whether a German text is a tweet or not.
The dataset contained about 20K instances, with a 50/50 distribution between the two classes. It was shuffled with a random seed of 42 and split into 80/20 for training/testing.
The NVIDIA RTX A6000 GPU was used for training three epochs with a batch size of 8. Other hyperparameters were default values from the HuggingFace Trainer.
The model was trained in order to evaluate a text style transfer task, converting formal-language texts to tweets.
### How to use
```python
from transformers import AutoModelForSequenceClassification, AutoTokenizer, TextClassificationPipeline
model_name = "rabuahmad/tweet-style-classifier-de"
model = AutoModelForSequenceClassification.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name, max_len=512)
classifier = TextClassificationPipeline(model=model, tokenizer=tokenizer, truncation=True, max_length=512)
text = "Gestern war ein schöner Tag!"
result = classifier(text)
```
Label 1 indicates that the text is predicted to be a tweet.
### Evaluation
Evaluation results on the test set:
| Metric |Score |
|----------|-----------|
| Accuracy | 0.99988 |
| Precision| 0.99901 |
| Recall | 0.99901 |
| F1 | 0.99901 |