|
--- |
|
language: fa |
|
license: apache-2.0 |
|
--- |
|
This repository is created with the aim to provide better models for NLI in persian, with the transparent codes for training I hope you guys find it inspiring and build better model in the future. for more details about the task and methods used for training check the [medium post](https://haddadhesam.medium.com/) and notebooks. |
|
|
|
|
|
# Dataset |
|
|
|
The dataset used for training is Wiki D/Similar dataset (wiki-d-similar.zip), obtained from [Sentence Transformers](https://github.com/m3hrdadfi/sentence-transformers) repository. |
|
|
|
# Model |
|
|
|
The proposed model is published at HuggingFace Hub with the name of ``demoversion/bert-fa-base-uncased-haddad-wikinli``. You can download and use the model from [HuggingFace Website](https://huggingface.co/demoversion/bert-fa-base-uncased-haddad-wikinli) or directly in transformers library like this: |
|
|
|
from transformers import pipeline |
|
model = pipeline("zero-shot-classification", model="demoversion/bert-fa-base-uncased-haddad-wikinli") |
|
labels = ["ورزشی", |
|
"سیاسی", |
|
"علمی", |
|
"فرهنگی"] |
|
template_str = "این یک متن {} است." |
|
str_sentence = "مرحله مقدماتی جام جهانی حاشیههای زیادی داشت." |
|
model(str_sentence, labels, hypothesis_template=template_str) |
|
The result of this code snippet is: |
|
|
|
Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation. |
|
{'labels': ['فرهنگی', 'علمی', 'سیاسی', 'ورزشی'], |
|
'scores': [0.25921085476875305, |
|
0.25713297724723816, |
|
0.24884170293807983, |
|
0.23481446504592896], |
|
'sequence': 'مرحله مقدماتی جام جهانی حاشیه\u200cهای زیادی داشت.'} |
|
Yep, the right label (highest score) without training. |
|
# Results |
|
|
|
The result comparing to the original model published for this dataset is available in the table bellow. |
|
|
|
|
|
|Model|dev_accuracy| dev_f1|test_accuracy|test_f1| |
|
|--|--|--|--|--| |
|
|[m3hrdadfi/bert-fa-base-uncased-wikinli](https://huggingface.co/m3hrdadfi/bert-fa-base-uncased-wikinli)|77.88|77.57|76.64|75.99| |
|
|[demoversion/bert-fa-base-uncased-haddad-wikinli](https://huggingface.co/demoversion/bert-fa-base-uncased-haddad-wikinli)|**78.62**|**79.74**|**77.04**|**78.56**| |
|
|
|
# Notebooks |
|
Notebooks used for training and evaluation are available below. |
|
|
|
[Training ![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/DemoVersion/persian-nli-trainer/blob/main/notebooks/training.ipynb) |
|
|
|
[Evaluation ![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/DemoVersion/persian-nli-trainer/blob/main/notebooks/evaluation.ipynb) |
|
|
|
|