|
--- |
|
language: |
|
- en |
|
tags: |
|
- classics |
|
- citation mining |
|
widget: |
|
- text: "Homer's Iliad opens with an invocation to the muse (1. 1)." |
|
--- |
|
|
|
### Model and entities |
|
`roberta_classics_ner` is a domain-specific RoBERTa-based model for named entity recognition in Classical Studies. It recognises bibliographical entities, such as: |
|
|
|
|
|
| id | label | desciption | Example | |
|
| --- | ------------- | ------------------------------------------- | --------------------- | |
|
| 0 | 'O' | Out of entity | | |
|
| 1 | 'B-AAUTHOR' | Ancient authors | *Herodotus* | |
|
| 2 | 'I-AAUTHOR' | | | |
|
| 3 | 'B-AWORK' | The title of an ancient work | *Symposium*, *Aeneid* | |
|
| 4 | 'I-AWORK' | | | |
|
| 5 | 'B-REFAUWORK' | A structured reference to an ancient work | *Homer, Il.* | |
|
| 6 | 'I-REFAUWORK' | | | |
|
| 7 | 'B-REFSCOPE' | The scope of a reference | *II.1.993a30–b11* | |
|
| 8 | 'I-REFSCOPE' | | | |
|
| 9 | 'B-FRAGREF' | A reference to fragmentary texts or scholia | *Frag. 19. West* | |
|
| 10 | 'I-FRAGREF' | | | |
|
|
|
|
|
### Example |
|
|
|
|
|
``` |
|
B-AAUTHOR B-AWORK B-REFSCOPE |
|
Homer 's Iliad opens with an invocation to the muse ( 1. 1). |
|
``` |
|
|
|
|
|
|
|
### Dataset |
|
|
|
|
|
`roberta_classics_ner` was fine-tuned and evaluated on `EpiBau`, a dataset which has not been released publicly yet. It is composed of four volumes of [Structures of Epic Poetry](https://www.epische-bauformen.uni-rostock.de/), a compendium on the narrative patterns and structural elements in ancient epic. |
|
Entity counts of the `Epibau` dataset are the following: |
|
|
|
| | train-set | dev-set | test-set | |
|
| -------------- | --------- | ------- | -------- | |
|
| word count | 712462 | 125729 | 122324 | |
|
| AAUTHOR | 4436 | 1368 | 1511 | |
|
| AWORK | 3145 | 780 | 670 | |
|
| REFAUWORK | 5102 | 988 | 1209 | |
|
| REFSCOPE | 14768 | 3193 | 2847 | |
|
| FRAGREF | 266 | 29 | 33 | |
|
| total entities | 13822 | 1415 | 2419 | |
|
|
|
|
|
### Results |
|
|
|
The model was developed in the context of experiments reported [here](http://infoscience.epfl.ch/record/291236?&ln=en).Trained and tested on `EpiBau` with a 85-15 split, the model yields a general F1 score of **.82** (micro-averages). Detailed scores are displayed below. Evaluation was performed with the [CLEF-HIPE-scorer](https://github.com/impresso/CLEF-HIPE-2020-scorer), in strict mode) |
|
|
|
|
|
| metric | AAUTHOR | AWORK | REFSCOPE | REFAUWORK | |
|
| --------- | ------- | ----- | -------- | --------- | |
|
| F1 | .819 | .796 | .863 | .756 | |
|
| Precision | .842 | .818 | .860 | .755 | |
|
| Recall | .797 | .766 | .756 | .866 | |
|
|
|
|
|
|
|
|
|
Questions, remarks, help or contribution ? Get in touch [here](https://github.com/AjaxMultiCommentary), we'll be happy to chat ! |
|
|
|
|