File size: 12,745 Bytes
fc82b7f b93f6a6 fc82b7f b93f6a6 d9d9313 b93f6a6 d9d9313 fc82b7f b93f6a6 e159f47 13f46dc e159f47 13f46dc e159f47 13f46dc e159f47 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 |
---
language: id
license: mit
datasets:
- indonli
- MoritzLaurer/multilingual-NLI-26lang-2mil7
pipeline_tag: zero-shot-classification
widget:
- text: Saya suka makan kentang goreng.
candidate_labels: positif, netral, negatif
hypothesis_template: Kalimat ini mengandung tema {}.
multi_class: false
example_title: Sentiment
- text: Apple umumkan harga iPhone 14.
candidate_labels: teknologi, olahraga, kuliner, bisnis
hypothesis_template: Kalimat ini mengandung tema {}.
multi_class: true
example_title: News
model-index:
- name: ilos-vigil/bigbird-small-indonesian-nli
results:
- task:
type: natural-language-inference
name: Natural Language Inference
dataset:
name: indonli
type: indonli
config: indonli
split: test_expert
metrics:
- type: accuracy
value: 0.5385388739946381
name: Accuracy
verified: true
verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiNWRhZDkxNmI2NzE3MzRlYmNlMWFjZDVmNWUwYmMwN2IxYzNjMWE4YzY4NWI3NDZkYTMzY2NjN2MyZGQ5YzEwZSIsInZlcnNpb24iOjF9.AgizskHeXOzs0v93DNojNoqR_-1bQsYBokL8jcfelFm-zt-r5YXt89WXBDLLg4oKv-Roj8sLhUwe7ei0Mf1-Ag
- type: f1
value: 0.530444188199697
name: F1 Macro
verified: true
verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiMjk2YTFhY2E3NGIzNzgxY2M5YzUzNGUzYTAwOWZkNGU3Y2I5MDA1MTc0YzM4Yjg0MmIzY2Y5M2EzOGYxNjY4NiIsInZlcnNpb24iOjF9.YZ_fTuVftTCM6SFfkFCLPbJWYmYNMYL9PNHUwNFHQXZeknf6OCBgQtr1gF6VM9mX6WuU4OKEl12tsAytlkm7Ag
- type: f1
value: 0.5385388739946381
name: F1 Micro
verified: true
verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiM2MxMGUyZmJhZTYzN2M4NDlkMTZmMzllOGVhMjRiODhkMGVkMGMxMjY2NDBkZWM3ZWY2ZjhmZTNmYWU5ZjEzMyIsInZlcnNpb24iOjF9.f0HQlPRx4VFnOOHsrvMKFni8g1B1OJfheOyADsf47GnrvCcW_dakDgBy5c_yy4TehQYRa6ToYGHnuQnemvhnBg
- type: f1
value: 0.5299257731385174
name: F1 Weighted
verified: true
verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiNTgzZjJkZWU0NDgyMGU5MDFmNzk2OWY1OWY4MzA2NTE3MDAxN2Y2MWExODJkYjdlN2I1YzgzYjljNjdkMTc1YiIsInZlcnNpb24iOjF9.lWB7MZlAiDjskKM-lx-XtLxTQYuWLz3QjyseDuZe_AxtyOKt2GZkP2NDOZxEWketHjRiTCQfBUvSfzFId-FCAg
- type: precision
value: 0.5592571894118881
name: Precision Macro
verified: true
verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiZDQxYTFlNTNjNDAwMWIxYmJlMzRkN2U5OWY1NWNjN2YyYTE2NzRjNjM3ZWNhMzM4NjFhYWM4MzJkYjY3MzU0YSIsInZlcnNpb24iOjF9.6OI4_M1wLX1Z1BztKUfZ-382F3coCeJjarsWc-J04TKpsFCddLjuF5ZDuBFmokpz4goRgx-FlH-5jCAsFkzkBg
- type: precision
value: 0.5385388739946381
name: Precision Micro
verified: true
verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiNzRmY2I4YTAzMTRkMjFjNTE1NTEwZDlmZGQ4NDUyYTAxY2JhOTliMDRhNWY3OGY4OWRlNTlkNzcxODc0MDMwYyIsInZlcnNpb24iOjF9.X7ekS-JYOXH5eNmSfKQ_no1rNAbuQ3C0pNYvorPVfcna6RU8n6O6FNQor0AWvatAWdefJG6H3J7_GoC6M5zECw
- type: precision
value: 0.5586108016541553
name: Precision Weighted
verified: true
verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiMjUwNjMxYjEwMTEzNzAwNzQwZDQwMTRmZDM2ZDk0ZDc3YTUxOTQzNDE5ZWI2NWI4MmJmODAxYTlmN2E0Nzk2MCIsInZlcnNpb24iOjF9.nAO1wRFHMtm5kem9VhuuRg54fpvA2uzwEutjzsnZoyemUHbI2U_1TK_dDmR4bmpPjVnCZt5sF-jEq4oZIaIbDQ
- type: recall
value: 0.5385813032215204
name: Recall Macro
verified: true
verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiNzVkNjliYTM0Njc3MTUzMDBmYTE5NDRkNzFjNzg2NzA0NzEyMTg4YTlkNGFlZWMxZWUwOGQzYzY1ZGU0ZmIwNyIsInZlcnNpb24iOjF9.cnEbDBJR8m3UqiuzCq_g4RUFLE8BVzXDebKguVrwPgY-Biu4sBFXVQvFyZScsLGEnaHYsE-R8ctTEGDdQONVBw
- type: recall
value: 0.5385388739946381
name: Recall Micro
verified: true
verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiODZkMmNjZWY4ZDYyYjU3NjQ2ZGNhZjkyNTQyOTg2ZjNmNDgwNDYxYmU2ZDA5M2EwOWRlMjMyYmI4MGU3MGMxNCIsInZlcnNpb24iOjF9.BfMB4_MZ-SYj1YbTES8pqgKNQkNnevSOjAwUqdoL6wsNpsKKWxPHmq0Kt9XufxHoQoyTkGvPfxh-0jEe3B1nBg
- type: recall
value: 0.5385388739946381
name: Recall Weighted
verified: true
verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiYmE3Yjg3OTVhMjdlMDk1YWFjMWIwNjMyZTA2Yzc3MjBlNjI1YWY5MzE0MjNkMDNiMmU5ZmIxYWExNmViYWE1NSIsInZlcnNpb24iOjF9.S9Bo-wq3wikFS-FqMQerxahu87PJyYx141G5PCWDtOs2wH1nf4texnJYWfHeVCJKZcKmS2RWn5XOjjJ9RoNJAA
- type: loss
value: 1.062397837638855
name: loss
verified: true
verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiOTFmNDI0ZmQ2YmNlZjJlZTdmZTYwOGVkMjdjMjJkMDIzNzhlOWFiNWQzNjFiMmU5NTdiM2Y1YjYxMjU4ZjQ2ZSIsInZlcnNpb24iOjF9.15RsFRkFpbarlU1L8UyV0o0_5WCveO_mT9CdO0UYwvQsOVjScheJ8fOqHBAC-C-CMTlfFNsmMhNrU_np8c_ZCQ
---
# Indonesian small BigBird model NLI
## Source Code
Source code to create this model and perform benchmark is available at [https://github.com/ilos-vigil/bigbird-small-indonesian](https://github.com/ilos-vigil/bigbird-small-indonesian).
## Model Description
This model is based on [bigbird-small-indonesian](https://huggingface.co/ilos-vigil/bigbird-small-indonesian) and was finetuned on 2 datasets. It is intended to be used for zero-shot text classification.
## How to use
> Inference for ZSC (Zero Shot Classification) task
```py
>>> pipe = pipeline(
... task='zero-shot-classification',
... model='./tmp/checkpoint-28832'
... )
>>> pipe(
... sequences='Fakta nomor 7 akan membuat ada terkejut',
... candidate_labels=['clickbait', 'bukan clickbait'],
... hypothesis_template='Judul video ini {}.',
... multi_label=False
... )
{
'sequence': 'Fakta nomor 7 akan membuat ada terkejut',
'labels': ['clickbait', 'bukan clickbait'],
'scores': [0.6102734804153442, 0.38972654938697815]
}
>>> pipe(
... sequences='Samsung tuntut balik Apple dengan alasan hak paten teknologi.',
... candidate_labels=['teknologi', 'olahraga', 'bisnis', 'politik', 'kesehatan', 'kuliner'],
... hypothesis_template='Kategori berita ini adalah {}.',
... multi_label=True
... )
{
'sequence': 'Samsung tuntut balik Apple dengan alasan hak paten teknologi.',
'labels': ['politik', 'teknologi', 'kesehatan', 'bisnis', 'olahraga', 'kuliner'],
'scores': [0.7390161752700806, 0.6657379269599915, 0.4459509551525116, 0.38407933712005615, 0.3679264783859253, 0.14181996881961823]
}
```
> Inference for NLI (Natural Language Inference) task
```py
>>> pipe = pipeline(
... task='text-classification',
... model='./tmp/checkpoint-28832',
... return_all_scores=True
... )
>>> pipe({
... 'text': 'Nasi adalah makanan pokok.', # Premise
... 'text_pair': 'Saya mau makan nasi goreng.' # Hypothesis
... })
[
{'label': 'entailment', 'score': 0.25495028495788574},
{'label': 'neutral', 'score': 0.40920916199684143},
{'label': 'contradiction', 'score': 0.33584052324295044}
]
>>> pipe({
... 'text': 'Python sering digunakan untuk web development dan AI research.',
... 'text_pair': 'AI research biasanya tidak menggunakan bahasa pemrograman Python.'
... })
[
{'label': 'entailment', 'score': 0.12508109211921692},
{'label': 'neutral', 'score': 0.22146646678447723},
{'label': 'contradiction', 'score': 0.653452455997467}
]
```
## Limitation and bias
This model inherit limitation/bias from it's parent model and 2 datasets used for fine-tuning. And just like most language model, this model is sensitive towards input change. Here's an example.
```py
>>> from transformers import pipeline
>>> pipe = pipeline(
... task='zero-shot-classification',
... model='./tmp/checkpoint-28832'
... )
>>> text = 'Resep sate ayam enak dan mudah.'
>>> candidate_labels = ['kuliner', 'olahraga']
>>> pipe(
... sequences=text,
... candidate_labels=candidate_labels,
... hypothesis_template='Kategori judul artikel ini adalah {}.',
... multi_label=False
... )
{
'sequence': 'Resep sate ayam enak dan mudah.',
'labels': ['kuliner', 'olahraga'],
'scores': [0.7711364030838013, 0.22886358201503754]
}
>>> pipe(
... sequences=text,
... candidate_labels=candidate_labels,
... hypothesis_template='Kelas kalimat ini {}.',
... multi_label=False
... )
{
'sequence': 'Resep sate ayam enak dan mudah.',
'labels': ['kuliner', 'olahraga'],
'scores': [0.7043636441230774, 0.295636385679245]
}
>>> pipe(
... sequences=text,
... candidate_labels=candidate_labels,
... hypothesis_template='{}.',
... multi_label=False
... )
{
'sequence': 'Resep sate ayam enak dan mudah.',
'labels': ['kuliner', 'olahraga'],
'scores': [0.5986711382865906, 0.4013288915157318]
}
```
## Training, evaluation and testing data
This model was finetuned with [IndoNLI](https://huggingface.co/datasets/indonli) and [multilingual-NLI-26lang-2mil7](https://huggingface.co/datasets/MoritzLaurer/multilingual-NLI-26lang-2mil7). Although `multilingual-NLI-26lang-2mil7` dataset is machine-translated, this dataset slightly improve result of NLI benchmark and extensively improve result of ZSC benchmark. Both evaluation and testing data is only based on IndoNLI dataset.
## Training Procedure
The model was finetuned on single RTX 3060 with 16 epoch/28832 steps with accumulated batch size 64. AdamW optimizer is used with LR 1e-4, weight decay 0.05, learning rate warmup for first 6% steps (1730 steps) and linear decay of the learning rate afterwards. Take note while model weight on epoch 9 has lowest loss/highest accuracy, it has slightly lower performance on ZSC benchmark. Additional information can be seen on Tensorboard training logs.
## Benchmark as NLI model
Both benchmark show result of 2 different model as additional comparison. Additional benchmark using IndoNLI dataset is available on it's paper [IndoNLI: A Natural Language Inference Dataset for Indonesian](https://aclanthology.org/2021.emnlp-main.821/).
| Model | bigbird-small-indonesian-nli | xlm-roberta-large-xnli | mDeBERTa-v3-base-xnli-multilingual-nli-2mil7 |
| ------------------------------------------ | ---------------------------- | ---------------------- | -------------------------------------------- |
| Parameter | 30.6M | 559.9M | 278.8M |
| Multilingual | | V | V |
| Finetuned on IndoNLI | V | | V |
| Finetuned on multilingual-NLI-26lang-2mil7 | V | | |
| Test (Lay) | 0.6888 | 0.2226 | 0.8151 |
| Test (Expert) | 0.5734 | 0.3505 | 0.7775 |
## Benchmark as ZSC model
[Indonesian-Twitter-Emotion-Dataset](https://github.com/meisaputri21/Indonesian-Twitter-Emotion-Dataset/) is used to perform ZSC benchmark. This benchmark include 4 different parameter which affect performance of each model differently. Hypothesis template for this benchmark is `Kalimat ini mengekspresikan perasaan {}.` and `{}.`. Take note F1 score measurement only calculate label with highest probability.
| Model | Multi-label | Use template | F1 Score |
| -------------------------------------------- | ----------- | ------------ | ------------ |
| bigbird-small-indonesian-nli | V | V | 0.3574 |
| | V | | 0.3654 |
| | | V | 0.3985 |
| | | | _0.4160_ |
| xlm-roberta-large-xnli | V | V | _**0.6292**_ |
| | V | | 0.5596 |
| | | V | 0.5737 |
| | | | 0.5433 |
| mDeBERTa-v3-base-xnli-multilingual-nli-2mil7 | V | V | 0.5324 |
| | V | | _0.5499_ |
| | | V | 0.5269 |
| | | | 0.5228 |
|