Update README.md
Browse files
README.md
CHANGED
@@ -1,22 +1,45 @@
|
|
1 |
Welcome to **RoBERTArg**!
|
2 |
|
3 |
-
π€ **
|
|
|
4 |
|
5 |
-
|
6 |
|
7 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
8 |
|
9 |
| Model | Acc | F1 | R arg | R non | P arg | P non |
|
10 |
|----|----|----|----|----|----|----|
|
11 |
| RoBERTArg | 0.8193 | 0.8021 | 0.8463 | 0.7986 | 0.7623 | 0.8719 |
|
12 |
|
13 |
-
**
|
14 |
|
15 |
| | ARGUMENT | NON-ARGUMENT |
|
16 |
|----|----|----|
|
17 |
| ARGUMENT | 2213 | 558 |
|
18 |
| NON-ARGUMENT | 325 | 1790 |
|
19 |
|
|
|
|
|
|
|
|
|
20 |
ππΎ Check out _chkla/argument-analyzer/_ for more details.
|
21 |
|
22 |
Enjoy and stay tuned! π
|
|
|
1 |
Welcome to **RoBERTArg**!
|
2 |
|
3 |
+
π€ **Model description**:
|
4 |
+
This model was trained on ~40k heterogeneous manually annotated sentences (π Stab et al. 2018) of controversial topics (abortion etc.) to classify text into one of two labels: π· **NON-ARGUMENT** (0) and **ARGUMENT** (1).
|
5 |
|
6 |
+
**Dataset**
|
7 |
|
8 |
+
Please note that the label distribution in the dataset is imbalanced:
|
9 |
+
* NON-ARGUMENTS:
|
10 |
+
* ARGUMENTS:
|
11 |
+
|
12 |
+
**Model training**
|
13 |
+
**RoBERTArg** was fine-tuned on a $RoBERTA_{base}$ pre-trained model using the HuggingFace trainer with the following hyperparameters. The hyperparameters were determined using a hyperparameter search on a 20% validation set.
|
14 |
+
|
15 |
+
```
|
16 |
+
training_args = TrainingArguments(
|
17 |
+
num_train_epochs=2,
|
18 |
+
learning_rate=2.3102e-06,
|
19 |
+
seed=8,
|
20 |
+
per_device_train_batch_size=64,
|
21 |
+
per_device_eval_batch_size=64,
|
22 |
+
)
|
23 |
+
```
|
24 |
+
|
25 |
+
**Evaluation**
|
26 |
+
The model was evaluated using 20% of the sentences (80-20 train-test split).
|
27 |
|
28 |
| Model | Acc | F1 | R arg | R non | P arg | P non |
|
29 |
|----|----|----|----|----|----|----|
|
30 |
| RoBERTArg | 0.8193 | 0.8021 | 0.8463 | 0.7986 | 0.7623 | 0.8719 |
|
31 |
|
32 |
+
Showing the **confusion matrix** using the 20% of the sentences as an evaluation set:
|
33 |
|
34 |
| | ARGUMENT | NON-ARGUMENT |
|
35 |
|----|----|----|
|
36 |
| ARGUMENT | 2213 | 558 |
|
37 |
| NON-ARGUMENT | 325 | 1790 |
|
38 |
|
39 |
+
**Intended Uses & Potential Limitations**
|
40 |
+
The model can be a practical starting point to the complex topic **Argument Mining**. It is a quite challenging task due to the different conceptions of an argument.
|
41 |
+
|
42 |
+
This model is a part of an open-source project providing several models to detect arguments in text.
|
43 |
ππΎ Check out _chkla/argument-analyzer/_ for more details.
|
44 |
|
45 |
Enjoy and stay tuned! π
|