File size: 1,143 Bytes
adad01b
 
 
11da66c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f94935f
 
11da66c
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
---
license: apache-2.0
---

model base: https://huggingface.co/microsoft/mdeberta-v3-base

dataset: https://github.com/ramybaly/Article-Bias-Prediction


training parameters:
- devices: 2xH100
- batch_size: 100
- epochs: 5
- dropout: 0.05
- max_length: 512
- learning_rate: 3e-5
- warmup_steps: 100
- random_state: 239


training methodology:
- sanitize dataset following specific rule-set, utilize random split as provided in the dataset
- train on train split and evaluate on validation split in each epoch
- evaluate test split only on the model that performed best on validation loss

result summary:
- throughout the five training epochs, model of x epoch achieved the lowest validation loss of x
- on test split x epoch model achieved f1 score of x and a test loss of x

usage:

```
model = AutoModelForSequenceClassification.from_pretrained("premsa/political-bias-prediction-allsides-mDeBERTa")
tokenizer = AutoTokenizer.from_pretrained(premsa/"premsa/political-bias-prediction-allsides-mDeBERTa")
nlp = pipeline("text-classification", model=model, tokenizer=tokenizer)
print(nlp("die massen werden von den medien kontrolliert."))
```