dardem
/

xlm-roberta-base-uk-toxicity

Text Classification

Inference Endpoints

Model card Files Files and versions Community

xlm-roberta-base-uk-toxicity / README.md

dardem's picture

Update README.md

7e66913 verified 5 months ago

|

1.21 kB

	---
	license: openrail++
	language:
	- uk
	widget:
	- text: Ти неймовірна!
	datasets:
	- ukr-detect/ukr-toxicity-dataset
	base_model:
	- FacebookAI/xlm-roberta-base
	---

	## Binary toxicity classifier for Ukrainian

	This is the fine-tuned on the downstream task ["xlm-roberta-base"](https://huggingface.co/xlm-roberta-base) instance.

	The evaluation metrics for binary toxicity classification are:

	Precision: 0.9130
	Recall: 0.9065
	F1: 0.9061

	The training and evaluation data will be clarified later.

	## How to use
	```python
	from transformers import AutoTokenizer, AutoModelForSequenceClassification

	# load tokenizer and model weights
	tokenizer = AutoTokenizer.from_pretrained('dardem/xlm-roberta-base-uk-toxicity')
	model = AutoModelForSequenceClassification.from_pretrained('dardem/xlm-roberta-base-uk-toxicity')

	# prepare the input
	batch = tokenizer.encode('Ти неймовірна!', return_tensors='pt')

	# inference
	model(batch)
	```

	## Citation

	```
	@article{dementieva2024toxicity,
	title={Toxicity Classification in Ukrainian},
	author={Dementieva, Daryna and Khylenko, Valeriia and Babakov, Nikolay and Groh, Georg},
	journal={arXiv preprint arXiv:2404.17841},
	year={2024}
	}
	```