|
--- |
|
language: zh |
|
tags: |
|
- sentiment-analysis |
|
- pytorch |
|
widget: |
|
- text: "房间非常非常小,内窗,特别不透气,因为夜里走廊灯光是亮的,内窗对着走廊,窗帘又不能完全拉死,怎么都会有一道光射进来。" |
|
- text: "尽快有洗衣房就好了。" |
|
- text: "很好,干净整洁,交通方便。" |
|
- text: "干净整洁很好" |
|
--- |
|
|
|
# Note |
|
|
|
BERT based sentiment analysis, finetune based on https://huggingface.co/IDEA-CCNL/Erlangshen-Roberta-330M-Sentiment . |
|
|
|
The model trained on **hotel human review chinese dataset**. |
|
|
|
# Usage |
|
|
|
```python |
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification, TextClassificationPipeline |
|
|
|
MODEL = "tezign/Erlangshen-Sentiment-FineTune" |
|
|
|
tokenizer = AutoTokenizer.from_pretrained(MODEL) |
|
|
|
model = AutoModelForSequenceClassification.from_pretrained(MODEL, trust_remote_code=True) |
|
|
|
classifier = TextClassificationPipeline(model=model, tokenizer=tokenizer) |
|
|
|
result = classifier("很好,干净整洁,交通方便。") |
|
|
|
print(result) |
|
|
|
""" |
|
print result |
|
>> [{'label': 'Positive', 'score': 0.989660382270813}] |
|
""" |
|
``` |
|
|
|
# Evaluate |
|
We compared and evaluated the performance of **Our finetune model** and the **Original Erlangshen model** on the **hotel human review test dataset**(5429 negative reviews and 1251 positive reviews). |
|
|
|
The results showed that our model substantial improved the precision and recall of positive reviews: |
|
|
|
```text |
|
Our finetune model: |
|
precision recall f1-score support |
|
|
|
Negative 0.99 0.98 0.98 5429 |
|
Positive 0.92 0.95 0.93 1251 |
|
|
|
accuracy 0.97 6680 |
|
macro avg 0.95 0.96 0.96 6680 |
|
weighted avg 0.97 0.97 0.97 6680 |
|
|
|
====================================================== |
|
|
|
Original Erlangshen model: |
|
precision recall f1-score support |
|
|
|
Negative 0.81 1.00 0.90 5429 |
|
Positive 0.00 0.00 0.00 1251 |
|
|
|
accuracy 0.81 6680 |
|
macro avg 0.41 0.50 0.45 6680 |
|
weighted avg 0.66 0.81 0.73 6680 |
|
``` |