sven-nm's picture
Update readme
1555ad5
|
raw
history blame
3.38 kB
---
language:
- en
tags:
- classics
- citation mining
widget:
- text: "Homer's Iliad opens with an invocation to the muse (1. 1)."
---
### Model and entities
`roberta_classics_ner` is a domain-specific RoBERTa-based model for named entity recognition in Classical Studies. It recognises bibliographical entities, such as:
| id | label | desciption | Example |
| --- | ------------- | ------------------------------------------- | --------------------- |
| 0 | 'O' | Out of entity | |
| 1 | 'B-AAUTHOR' | Ancient authors | *Herodotus* |
| 2 | 'I-AAUTHOR' | | |
| 3 | 'B-AWORK' | The title of an ancient work | *Symposium*, *Aeneid* |
| 4 | 'I-AWORK' | | |
| 5 | 'B-REFAUWORK' | A structured reference to an ancient work | *Homer, Il.* |
| 6 | 'I-REFAUWORK' | | |
| 7 | 'B-REFSCOPE' | The scope of a reference | *II.1.993a30–b11* |
| 8 | 'I-REFSCOPE' | | |
| 9 | 'B-FRAGREF' | A reference to fragmentary texts or scholia | *Frag. 19. West* |
| 10 | 'I-FRAGREF' | | |
### Example
```
B-AAUTHOR B-AWORK B-REFSCOPE
Homer 's Iliad opens with an invocation to the muse ( 1. 1).
```
### Dataset
`roberta_classics_ner` was fine-tuned and evaluated on `EpiBau`, a dataset which has not been released publicly yet. It is composed of four volumes of [Structures of Epic Poetry](https://www.epische-bauformen.uni-rostock.de/), a compendium on the narrative patterns and structural elements in ancient epic.
Entity counts of the `Epibau` dataset are the following:
| | train-set | dev-set | test-set |
| -------------- | --------- | ------- | -------- |
| word count | 712462 | 125729 | 122324 |
| AAUTHOR | 4436 | 1368 | 1511 |
| AWORK | 3145 | 780 | 670 |
| REFAUWORK | 5102 | 988 | 1209 |
| REFSCOPE | 14768 | 3193 | 2847 |
| FRAGREF | 266 | 29 | 33 |
| total entities | 13822 | 1415 | 2419 |
### Results
The model was developed in the context of experiments reported [here](http://infoscience.epfl.ch/record/291236?&ln=en).Trained and tested on `EpiBau` with a 85-15 split, the model yields a general F1 score of **.82** (micro-averages). Detailed scores are displayed below. Evaluation was performed with the [CLEF-HIPE-scorer](https://github.com/impresso/CLEF-HIPE-2020-scorer), in strict mode)
| metric | AAUTHOR | AWORK | REFSCOPE | REFAUWORK |
| --------- | ------- | ----- | -------- | --------- |
| F1 | .819 | .796 | .863 | .756 |
| Precision | .842 | .818 | .860 | .755 |
| Recall | .797 | .766 | .756 | .866 |
Questions, remarks, help or contribution ? Get in touch [here](https://github.com/AjaxMultiCommentary), we'll be happy to chat !