Arabic Named Entity Recognition
This project is made to enrich the Arabic Named Entity Recognition(ANER). Arabic is a tough language to deal with and has alot of difficulties. We managed to made a model based on Arabert to support 50 entities.
Paper:
This is the paper for the system, where you can find all the details: https://arxiv.org/abs/2308.14669
Dataset
Evaluation results
The model achieves the following results:
Dataset | WikiFANE Gold | WikiFANE Gold | WikiFANE Gold | NewsFANE Gold | NewsFANE Gold | NewsFANE Gold |
---|---|---|---|---|---|---|
(metric) | (Recall) | (Precision) | (F1) | (Recall) | (Precision) | (F1) |
87.0 | 90.5 | 88.7 | 78.1 | 77.4 | 77.7 |
Usage
The model is available on the HuggingFace model page under the name: boda/ANER. Checkpoints are available only in PyTorch at the time.
Use in python:
from transformers import AutoTokenizer, AutoModelForTokenClassification
tokenizer = AutoTokenizer.from_pretrained("boda/ANER")
model = AutoModelForTokenClassification.from_pretrained("boda/ANER")
Acknowledgments
Thanks to Arabert for providing the Arabic Bert model, which we used as a base model for our work.
We also would like to thank Prof. Fahd Saleh S Alotaibi at the Faculty of Computing and Information Technology King Abdulaziz University, for providing the dataset which we used to train our model with.
Contacts
Abdelrahman Atef
- Downloads last month
- 443