amengemeda/amharic-hate-speech-detection-mBERT

Amharic Hate Speech Detection using Fine-tuned mBERT

Model description

This model was created by finetuning the mBERT model for the downstream task of Hate speech detection for the Amharic language. The initial mBERT model used for finetuning is Davlan/bert-base-multilingual-cased-finetuned-amharic which was provided by Davlan on Huggingface. The model was fine-tuned using HuggingFace's Trainer API. The final result of the finetuning has an F1-score of 0.9172 and an accuracy of 91.59%. The model was finetuned with 15 epochs and a learning rate of 0.00005.

Dataset description The finetuning was done on an Amharic Dataset that was made available by Mendeley Data (https://data.mendeley.com/datasets/ymtmxx385m). It has a size of 30,000 rows.

Other The Google Colab notebook is made available on my GitHub. Check this path https://github.com/amengemeda/ISproject-2/blob/main/mBERT/Amharic_Hate_Speech_detection_using_mBERT_(Trainer_API).ipynb