Table of Contents
Model description
robbert-base-v2-NER-NL-legislation-refs is a fine-tuned RobBERT model that was trained to recognize the entity type 'legislation references' (REF) in Dutch case law.
Specifically, this model is a pdelobelle/robbert-v2-dutch-base model (RoBERTa architecture) that was fine-tuned on the robbert-base-v2-NER-NL-legislation-refs-data dataset.
Training procedure
Dataset
This model was fine-tuned on the robbert-base-v2-NER-NL-legislation-refs-data dataset. This dataset consists of 512 token long examples which each contain one or more legislation references. These examples were created from a weakly labelled corpus of Dutch case law which was scraped from Linked Data Overheid, pre-tokenized and labelled (biluo_tags_from_offsets) through spaCy and further tokenized through applying Hugging Face's AutoTokenizer.from_pretrained() for pdelobelle/robbert-v2-dutch-base's tokenizer.
Results
Model | Precision | Recall | F1-score |
---|---|---|---|
RobBERT | 0.874 | 0.903 | 0.888 |
Using Hugging Face's hosted inference API widget this model can be quickly tested on the provided examples. Note that the hosted inference API widget incorrectly presents the last token of a legislation reference as a seperate entity due to the workings of its 'simple' aggregation_strategy. While this model was fine-tuned on training data labelled in accordence with the BILOU scheme, the hosted inference API groups entities by merging B- and I- tags when the tag is similar (thereby omitting the L- tags).
Limitations and biases
More information needed
BibTeX entry and citation info
More information needed
- Downloads last month
- 0
Dataset used to train romjansen/robbert-base-v2-NER-NL-legislation-refs
Evaluation results
- precision on romjansen/robbert-base-v2-NER-NL-legislation-refs-dataself-reported0.874
- recall on romjansen/robbert-base-v2-NER-NL-legislation-refs-dataself-reported0.903
- F1-score on romjansen/robbert-base-v2-NER-NL-legislation-refs-dataself-reported0.888