# CZERT This repository keeps trained model Czert-B for the paper [Czert – Czech BERT-like Model for Language Representation ](https://arxiv.org/abs/2103.13031) For more information, see the paper ## How to Use CZERT? ### Sentence Level Tasks We evaluate our model on two sentence level tasks: * Sentiment Classification, * Semantic Text Similarity. ### Document Level Tasks We evaluate our model on one document level task * Multi-label Document Classification. ### Token Level Tasks We evaluate our model on three token level tasks: * Named Entity Recognition, * Morphological Tagging, * Semantic Role Labelling. ## Downstream Tasks Fine-tuning Results ### Sentiment Classification | | mBERT | SlavicBERT | ALBERT-r | Czert-A | Czert-B | |:----:|:------------------------:|:------------------------:|:------------------------:|:-----------------------:|:--------------------------------:| | FB | 71.72 ± 0.91 | 73.87 ± 0.50 | 59.50 ± 0.47 | 72.47 ± 0.72 | **76.55** ± **0.14** | | CSFD | 82.80 ± 0.14 | 82.51 ± 0.14 | 75.40 ± 0.18 | 79.58 ± 0.46 | **84.79** ± **0.26** | Average F1 results for the Sentiment Classification task. For more information, see [the paper](https://arxiv.org/abs/2103.13031). ### Semantic Text Similarity | | **mBERT** | **Pavlov** | **Albert-random** | **Czert-A** | **Czert-B** | |:-------------|:--------------:|:--------------:|:-----------------:|:--------------:|:----------------------:| | STA-CNA | 83.335 ± 0.063 | 83.593 ± 0.050 | 43.184 ± 0.125 | 82.942 ± 0.106 | **84.345** ± **0.028** | | STS-SVOB-img | 79.367 ± 0.486 | 79.900 ± 0.810 | 15.739 ± 2.992 | 79.444 ± 0.338 | **83.744** ± **0.395** | | STS-SVOB-hl | 78.833 ± 0.296 | 76.996 ± 0.305 | 33.949 ± 1.807 | 75.089 ± 0.806 | **79.827 ± 0.469** | Comparison of Pearson correlation achieved using pre-trained CZERT-A, CZERT-B, mBERT, Pavlov and randomly initialised Albert on semantic text similarity. For more information see [the paper](https://arxiv.org/abs/2103.13031). ### Multi-label Document Classification | | mBERT | SlavicBERT | ALBERT-r | Czert-A | Czert-B | |:-----:|:------------:|:------------:|:------------:|:------------:|:-------------------:| | AUROC | 97.62 ± 0.08 | 97.80 ± 0.06 | 94.35 ± 0.13 | 97.49 ± 0.07 | **98.00** ± **0.04** | | F1 | 83.04 ± 0.16 | 84.08 ± 0.14 | 72.44 ± 0.22 | 82.27 ± 0.17 | **85.06** ± **0.11** | Comparison of F1 and AUROC score achieved using pre-trained CZERT-A, CZERT-B, mBERT, Pavlov and randomly initialised Albert on multi-label document classification. For more information see [the paper](https://arxiv.org/abs/2103.13031). ### Morphological Tagging | | mBERT | Pavlov | Albert-random | Czert-A | Czert-B | |:-----------------------|:---------------|:---------------|:---------------|:---------------|:---------------| | Universal Dependencies | 99.176 ± 0.006 | 99.211 ± 0.008 | 96.590 ± 0.096 | 98.713 ± 0.008 | **99.300 ± 0.009** | Comparison of F1 score achieved using pre-trained CZERT-A, CZERT-B, mBERT, Pavlov and randomly initialised Albert on morphological tagging task. For more information see [the paper](https://arxiv.org/abs/2103.13031). ### Semantic Role Labelling