metadata
library_name: transformers
tags:
- nlp
- classification
license: apache-2.0
datasets:
- zeyadusf/daigt
language:
- en
base_model:
- FacebookAI/roberta-base
pipeline_tag: text-classification
Model Card for Model ID
Model Details
- eval_loss : 0.02619364485144615,
- eval_accuracy: 0.9941391941391942,
- eval_f1-score: 0.9941391909936754,
- epoch : 2.0
Classification Report:
precision recall f1-score support
0 1.00 0.99 0.99 1365
1 0.99 1.00 0.99 1365
accuracy 0.99 2730
macro avg 0.99 0.99 0.99 2730
weighted avg 0.99 0.99 0.99 2730
Clean Function
- I used it when I tested manual the model and it gave good results when cleaning.
import re
import html
def clean_text(text):
# Remove HTML tags
clean = re.compile('<.*?>')
text = re.sub(clean, '', text)
# Replace HTML entities with their corresponding characters
text = html.unescape(text)
# Remove extra whitespace and normalize spaces
text = re.sub(r'\s+', ' ', text).strip()
text = re.sub(r'[^a-zA-Z0-9\s]', '', text)
return re.sub("\s\s+", " ", text)