|
--- |
|
license: apache-2.0 |
|
datasets: |
|
- SungJoo/KBMC |
|
language: |
|
- ko |
|
library_name: transformers |
|
tags: |
|
- medical |
|
- NER |
|
--- |
|
|
|
|
|
# Model Card for medical-ner-koelectra |
|
|
|
## Model Summary |
|
|
|
This model is a fine-tuned version of the [monologg/koelectra-base-v3-discriminator](https://huggingface.co/monologg/koelectra-base-v3-discriminator). |
|
|
|
We fine-tuned the model using the KBMC and [Naver X Changwon Univ NER dataset](https://ko-nlp.github.io/Korpora/ko-docs/corpuslist/naver_changwon_ner.html) datasets. |
|
|
|
## Model Details |
|
|
|
### Model Description |
|
|
|
- **Developed by:** Sungjoo Byun (Grace Byun) |
|
- **Language(s) (NLP):** Korean |
|
- **License:** Apache 2.0 |
|
- **Finetuned from model:** monologg/koelectra-base-v3-discriminator |
|
|
|
|
|
## Training Data |
|
|
|
The model was trained using the dataset [Naver X Changwon Univ NER dataset](https://ko-nlp.github.io/Korpora/ko-docs/corpuslist/naver_changwon_ner.html) and [Korean Bio-Medical Corpus (KBMC)](https://huggingface.co/datasets/SungJoo/KBMC). |
|
|
|
# Model Performance |
|
|
|
## Overall Metrics |
|
|
|
- **F1 Score:** 0.8886 |
|
- **Loss:** 0.2949 |
|
- **Precision:** 0.8844 |
|
- **Recall:** 0.8928 |
|
|
|
## Class-wise Performance |
|
|
|
| Class | Precision | Recall | F1-Score | Support | |
|
|-------------|-----------|--------|----------|---------| |
|
| AFW | 0.6676 | 0.6326 | 0.6496 | 362 | |
|
| ANM | 0.7476 | 0.7800 | 0.7635 | 600 | |
|
| Body | 0.9731 | 0.9813 | 0.9772 | 1068 | |
|
| CVL | 0.8492 | 0.8579 | 0.8536 | 4977 | |
|
| DAT | 0.9078 | 0.9286 | 0.9181 | 2130 | |
|
| Disease | 0.9738 | 0.9872 | 0.9805 | 2109 | |
|
| EVT | 0.7332 | 0.7446 | 0.7389 | 1026 | |
|
| FLD | 0.6138 | 0.6170 | 0.6154 | 188 | |
|
| LOC | 0.8721 | 0.8691 | 0.8706 | 1734 | |
|
| MAT | 0.5385 | 0.5000 | 0.5185 | 14 | |
|
| NUM | 0.9227 | 0.9305 | 0.9266 | 4660 | |
|
| ORG | 0.8917 | 0.8866 | 0.8892 | 3307 | |
|
| PER | 0.8918 | 0.9049 | 0.8983 | 3626 | |
|
| PLT | 0.2941 | 0.2174 | 0.2500 | 23 | |
|
| TIM | 0.8644 | 0.9173 | 0.8901 | 278 | |
|
| Treatment | 0.9468 | 0.9852 | 0.9656 | 271 | |
|
|
|
## Averages |
|
|
|
| Metric | Micro Avg | Macro Avg | Weighted Avg | |
|
|----------------|-----------|-----------|--------------| |
|
| Precision | 0.8844 | 0.7930 | 0.8841 | |
|
| Recall | 0.8928 | 0.7963 | 0.8928 | |
|
| F1-Score | 0.8886 | 0.7941 | 0.8884 | |
|
|
|
|
|
## Citations |
|
|
|
Please cite our KBMC paper: |
|
|
|
```bibtex |
|
@misc{byun2024korean, |
|
title={Korean Bio-Medical Corpus (KBMC) for Medical Named Entity Recognition}, |
|
author={Sungjoo Byun and Jiseung Hong and Sumin Park and Dongjun Jang and Jean Seo and Minseok Kim and Chaeyoung Oh and Hyopil Shin}, |
|
year={2024}, |
|
eprint={2403.16158}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.CL} |
|
} |
|
``` |
|
|
|
## Model Card Contact |
|
|
|
For any questions or issues, please contact [email protected]. |