metadata

library_name: transformers
tags:
  - nlp
  - classification
license: apache-2.0
datasets:
  - zeyadusf/daigt
language:
  - en
base_model:
  - FacebookAI/roberta-base
pipeline_tag: text-classification

Model Card for Model ID

Model Details

eval_loss : 0.02619364485144615,
eval_accuracy: 0.9941391941391942,
eval_f1-score: 0.9941391909936754,
epoch : 2.0

Classification Report:
              precision    recall  f1-score   support

           0       1.00      0.99      0.99      1365
           1       0.99      1.00      0.99      1365

    accuracy                           0.99      2730
   macro avg       0.99      0.99      0.99      2730
weighted avg       0.99      0.99      0.99      2730

Clean Function

I used it when I tested manual the model and it gave good results when cleaning.

import re
import html
def clean_text(text):
   # Remove HTML tags
   clean = re.compile('<.*?>')
   text = re.sub(clean, '', text)
   # Replace HTML entities with their corresponding characters
   text = html.unescape(text)
   # Remove extra whitespace and normalize spaces
   text = re.sub(r'\s+', ' ', text).strip()
   text = re.sub(r'[^a-zA-Z0-9\s]', '', text)
   return re.sub("\s\s+", " ", text)