|
--- |
|
language: |
|
- sk |
|
tags: |
|
- twitter |
|
license: cc |
|
metrics: |
|
- f1 |
|
widget: |
|
- text: "Najkrajšia vianočná reklama: Toto milé video vám vykúzli čarovnú atmosféru: Vianoce sa nezadržateľne blížia." |
|
- text: "A opäť sa objavili nebezpečné výrobky. Pozrite sa, či ich nemáte doma" |
|
--- |
|
|
|
|
|
# Sentiment Analysis model based on SlovakBERT |
|
|
|
This is a sentiment analysis classifier based on [SlovakBERT](https://huggingface.co/gerulata/slovakbert). The model can distinguish three level of sentiment: |
|
|
|
- `-1` - Negative sentiment |
|
- `0` - Neutral sentiment |
|
- `1` - Positive setiment |
|
|
|
The model was fine-tuned using Slovak part of [Multilingual Twitter Sentiment Analysis Dataset](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0155036) [Mozetič et al 2016] containing 50k manually annotated Slovak tweets. As such, it is fine-tuned for tweets and it is not advised to use the model for general-purpose sentiment analysis. |
|
|
|
## Results |
|
|
|
The model was evaluated in [our paper](https://arxiv.org/abs/2109.15254) [Pikuliak et al 2021, Section 4.4]. It achieves \\(0.67\\) F1-score on the original dataset and \\(0.58\\) F1-score on general reviews dataset. |
|
|
|
## Cite |
|
|
|
``` |
|
@article{DBLP:journals/corr/abs-2109-15254, |
|
author = {Mat{\'{u}}{\v{s}} Pikuliak and |
|
{\v{S}}tefan Grivalsk{\'{y}} and |
|
Martin Kon{\^{o}}pka and |
|
Miroslav Bl{\v{s}}t{\'{a}}k and |
|
Martin Tamajka and |
|
Viktor Bachrat{\'{y}} and |
|
Mari{\'{a}}n {\v{S}}imko and |
|
Pavol Bal{\'{a}}{\v{z}}ik and |
|
Michal Trnka and |
|
Filip Uhl{\'{a}}rik}, |
|
title = {SlovakBERT: Slovak Masked Language Model}, |
|
journal = {CoRR}, |
|
volume = {abs/2109.15254}, |
|
year = {2021}, |
|
url = {https://arxiv.org/abs/2109.15254}, |
|
eprinttype = {arXiv}, |
|
eprint = {2109.15254}, |
|
} |
|
``` |