|
--- |
|
library_name: transformers |
|
tags: |
|
- nlp |
|
- classification |
|
license: apache-2.0 |
|
datasets: |
|
- zeyadusf/daigt |
|
language: |
|
- en |
|
base_model: |
|
- FacebookAI/roberta-base |
|
pipeline_tag: text-classification |
|
--- |
|
|
|
# Model Card for Model ID |
|
|
|
<!-- Provide a quick summary of what the model is/does. --> |
|
|
|
|
|
|
|
## Model Details |
|
|
|
- eval_loss : 0.02619364485144615, |
|
- eval_accuracy: 0.9941391941391942, |
|
- eval_f1-score: 0.9941391909936754, |
|
- epoch : 2.0 |
|
|
|
|
|
|
|
``` |
|
Classification Report: |
|
precision recall f1-score support |
|
|
|
0 1.00 0.99 0.99 1365 |
|
1 0.99 1.00 0.99 1365 |
|
|
|
accuracy 0.99 2730 |
|
macro avg 0.99 0.99 0.99 2730 |
|
weighted avg 0.99 0.99 0.99 2730 |
|
```` |
|
|
|
![image/png](https://cdn-uploads.huggingface.co/production/uploads/65cf6fed0954f06e47d97e56/iTsf9imsdWtxtRaiMa5On.png) |
|
|
|
|
|
#### Clean Function |
|
- I used it when I tested manual the model and it gave good results when cleaning. |
|
```python |
|
import re |
|
import html |
|
def clean_text(text): |
|
# Remove HTML tags |
|
clean = re.compile('<.*?>') |
|
text = re.sub(clean, '', text) |
|
# Replace HTML entities with their corresponding characters |
|
text = html.unescape(text) |
|
# Remove extra whitespace and normalize spaces |
|
text = re.sub(r'\s+', ' ', text).strip() |
|
text = re.sub(r'[^a-zA-Z0-9\s]', '', text) |
|
return re.sub("\s\s+", " ", text) |
|
``` |