|
--- |
|
license: gpl-3.0 |
|
language: |
|
- pl |
|
pipeline_tag: fill-mask |
|
widget: |
|
- text: "Kartony to inaczej [MASK], które produkowane są z tektury." |
|
--- |
|
|
|
# Model Card for KartonBERT_base_cased_v1 |
|
|
|
|
|
This is a classic Polish BERT model, trained with MLM task. |
|
It comes with a custom ~23k-tokens BWPT tokenizer. While not ideal, |
|
it performs well on certain downstream tasks and serves as a checkpoint in my work. |
|
|
|
## Model Description |
|
|
|
|
|
- **Developed by:** Bartłomiej Orlik, https://www.linkedin.com/in/bartłomiej-orlik/ |
|
- **Model type:** pretrained BERT base uncased (~23k tokenizer) |
|
- **Language:** Polish |
|
- **License:** GPL-3.0 |
|
|
|
|
|
## How to use model for fill-mask task |
|
|
|
Use the code below to get started with the model. |
|
```python |
|
from transformers import pipeline |
|
|
|
tokenizer_kwargs={'truncation': True, 'max_length': 512} |
|
model = pipeline('fill-mask', model='OrlikB/KartonBERT_base_uncased_v1', tokenizer_kwargs=tokenizer_kwargs) |
|
|
|
model("Kartony to inaczej [MASK], które produkowane są z tektury.") |
|
|
|
# Output |
|
[{'score': 0.12927177548408508, |
|
'token': 5324, |
|
'token_str': 'materiały', |
|
'sequence': 'kartony to inaczej materiały, które produkowane są z tektury.'}, |
|
{'score': 0.0821441262960434, |
|
'token': 2403, |
|
'token_str': 'produkty', |
|
'sequence': 'kartony to inaczej produkty, które produkowane są z tektury.'}, |
|
{'score': 0.06760794669389725, |
|
'token': 392, |
|
'token_str': 'te', |
|
'sequence': 'kartony to inaczej te, które produkowane są z tektury.'}, |
|
{'score': 0.06753358244895935, |
|
'token': 20289, |
|
'token_str': 'pudełka', |
|
'sequence': 'kartony to inaczej pudełka, które produkowane są z tektury.'}, |
|
{'score': 0.04844100773334503, |
|
'token': 16715, |
|
'token_str': 'wyroby', |
|
'sequence': 'kartony to inaczej wyroby, które produkowane są z tektury.'}] |
|
``` |
|
|