|
--- |
|
license: apache-2.0 |
|
datasets: |
|
- Alienmaster/SB10k |
|
- cardiffnlp/tweet_sentiment_multilingual |
|
- legacy-datasets/wikipedia |
|
- community-datasets/gnad10 |
|
language: |
|
- de |
|
base_model: dbmdz/bert-base-german-uncased |
|
pipeline_tag: text-classification |
|
--- |
|
|
|
## Tweet Style Classifier (German) |
|
|
|
|
|
This model is a fine-tuned bert-base-uncased on a binary classification task to determine whether a German text is a tweet or not. |
|
|
|
The dataset contained about 20K instances, with a 50/50 distribution between the two classes. It was shuffled with a random seed of 42 and split into 80/20 for training/testing. |
|
The NVIDIA RTX A6000 GPU was used for training three epochs with a batch size of 8. Other hyperparameters were default values from the HuggingFace Trainer. |
|
|
|
The model was trained in order to evaluate a text style transfer task, converting formal-language texts to tweets. |
|
|
|
### How to use |
|
|
|
```python |
|
from transformers import AutoModelForSequenceClassification, AutoTokenizer, TextClassificationPipeline |
|
|
|
model_name = "rabuahmad/tweet-style-classifier-de" |
|
|
|
model = AutoModelForSequenceClassification.from_pretrained(model_name) |
|
tokenizer = AutoTokenizer.from_pretrained(model_name, max_len=512) |
|
|
|
classifier = TextClassificationPipeline(model=model, tokenizer=tokenizer, truncation=True, max_length=512) |
|
|
|
text = "Gestern war ein schöner Tag!" |
|
|
|
result = classifier(text) |
|
|
|
``` |
|
Label 1 indicates that the text is predicted to be a tweet. |
|
|
|
### Evaluation |
|
|
|
Evaluation results on the test set: |
|
|
|
| Metric |Score | |
|
|----------|-----------| |
|
| Accuracy | 0.99988 | |
|
| Precision| 0.99901 | |
|
| Recall | 0.99901 | |
|
| F1 | 0.99901 | |