nhull commited on
Commit
def95bf
·
verified ·
1 Parent(s): 7693813

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +98 -0
README.md ADDED
@@ -0,0 +1,98 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ datasets:
4
+ - nhull/tripadvisor-split-dataset-v2
5
+ language:
6
+ - en
7
+ pipeline_tag: text-classification
8
+ tags:
9
+ - sentiment-analysis
10
+ - random-forest
11
+ - text-classification
12
+ - hotel-reviews
13
+ - tripadvisor
14
+ - nlp
15
+ ---
16
+
17
+ # Random Forest Sentiment Analysis Model
18
+
19
+ This model is a **Random Forest** classifier trained on the **TripAdvisor sentiment analysis dataset**. It predicts the sentiment of hotel reviews on a 1-5 star scale. The model takes text input (hotel reviews) and outputs a sentiment rating from 1 to 5 stars.
20
+
21
+ ## Model Details
22
+
23
+ - **Model Type**: Random Forest
24
+ - **Task**: Sentiment Analysis
25
+ - **Input**: A hotel review (text)
26
+ - **Output**: Sentiment rating (1-5 stars)
27
+ - **Dataset Used**: TripAdvisor sentiment dataset (balanced labels)
28
+
29
+ ## Intended Use
30
+
31
+ This model is designed to classify hotel reviews based on their sentiment. It assigns a star rating between 1 and 5 to a review, indicating the sentiment expressed in the review.
32
+
33
+ ## How to Use the Model
34
+
35
+ 1. **Install the required dependencies**:
36
+ ```bash
37
+ pip install joblib
38
+ ```
39
+
40
+ 2. **Download and load the model**:
41
+ You can download the model from Hugging Face and use it to predict sentiment.
42
+
43
+ Example code to download and use the model:
44
+ ```python
45
+ from huggingface_hub import hf_hub_download
46
+ import joblib
47
+
48
+ # Download model from Hugging Face
49
+ model_path = hf_hub_download(repo_id="your-username/random-forest-model", filename="random_forest_model.joblib")
50
+
51
+ # Load the model
52
+ model = joblib.load(model_path)
53
+
54
+ # Predict sentiment of a review
55
+ def predict_sentiment(review):
56
+ return model.predict([review])[0]
57
+
58
+ review = "This hotel was fantastic. The service was great and the room was clean."
59
+ print(f"Predicted sentiment: {predict_sentiment(review)}")
60
+ ```
61
+
62
+ 3. **The model will return a sentiment rating** between 1 and 5 stars, where:
63
+ - 1: Very bad
64
+ - 2: Bad
65
+ - 3: Neutral
66
+ - 4: Good
67
+ - 5: Very good
68
+
69
+ ## Model Evaluation
70
+
71
+ - **Test Accuracy**: 55.28% on the test set.
72
+
73
+ - **Classification Report** (Test Set):
74
+
75
+ | Label | Precision | Recall | F1-score | Support |
76
+ |-------|-----------|--------|----------|---------|
77
+ | 1.0 | 0.62 | 0.78 | 0.69 | 1600 |
78
+ | 2.0 | 0.48 | 0.38 | 0.42 | 1600 |
79
+ | 3.0 | 0.49 | 0.40 | 0.44 | 1600 |
80
+ | 4.0 | 0.49 | 0.46 | 0.48 | 1600 |
81
+ | 5.0 | 0.63 | 0.74 | 0.68 | 1600 |
82
+ | **Accuracy** | - | - | **0.55** | 8000 |
83
+ | **Macro avg** | 0.54 | 0.55 | 0.54 | 8000 |
84
+ | **Weighted avg** | 0.54 | 0.55 | 0.54 | 8000 |
85
+
86
+ - **Cross-validation Scores**:
87
+
88
+ * **Random Forest Cross-validation scores**:
89
+ `[0.54983553, 0.55164474, 0.55805921, 0.55657895, 0.54424342]`
90
+
91
+ * **Random Forest Mean Cross-validation score**:
92
+ `0.5521`
93
+
94
+ ## Limitations
95
+
96
+ - The model performs well on extreme ratings (1 and 5 stars) but struggles with intermediate ratings (2, 3, and 4 stars).
97
+ - The model was trained on the **TripAdvisor** dataset and may not generalize well to reviews from other sources or domains.
98
+ - The model does not handle aspects like sarcasm or humor well, and shorter reviews may lead to less accurate predictions.