|
--- |
|
license: openrail++ |
|
language: |
|
- uk |
|
widget: |
|
- text: Ти неймовірна! |
|
datasets: |
|
- ukr-detect/ukr-toxicity-dataset |
|
base_model: |
|
- FacebookAI/xlm-roberta-base |
|
--- |
|
|
|
## Binary toxicity classifier for Ukrainian |
|
|
|
This is the fine-tuned on the downstream task ["xlm-roberta-base"](https://huggingface.co/xlm-roberta-base) instance. |
|
|
|
The evaluation metrics for binary toxicity classification are: |
|
|
|
**Precision**: 0.9130 |
|
**Recall**: 0.9065 |
|
**F1**: 0.9061 |
|
|
|
The training and evaluation data will be clarified later. |
|
|
|
## How to use |
|
```python |
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification |
|
|
|
# load tokenizer and model weights |
|
tokenizer = AutoTokenizer.from_pretrained('dardem/xlm-roberta-base-uk-toxicity') |
|
model = AutoModelForSequenceClassification.from_pretrained('dardem/xlm-roberta-base-uk-toxicity') |
|
|
|
# prepare the input |
|
batch = tokenizer.encode('Ти неймовірна!', return_tensors='pt') |
|
|
|
# inference |
|
model(batch) |
|
``` |
|
|
|
## Citation |
|
|
|
``` |
|
@article{dementieva2024toxicity, |
|
title={Toxicity Classification in Ukrainian}, |
|
author={Dementieva, Daryna and Khylenko, Valeriia and Babakov, Nikolay and Groh, Georg}, |
|
journal={arXiv preprint arXiv:2404.17841}, |
|
year={2024} |
|
} |
|
``` |