nhull
/

random-forest-model

+---
+license: mit
+datasets:
+- nhull/tripadvisor-split-dataset-v2
+language:
+- en
+pipeline_tag: text-classification
+tags:
+- sentiment-analysis
+- random-forest
+- text-classification
+- hotel-reviews
+- tripadvisor
+- nlp
+---
+# Random Forest Sentiment Analysis Model
+This model is a **Random Forest** classifier trained on the **TripAdvisor sentiment analysis dataset**. It predicts the sentiment of hotel reviews on a 1-5 star scale. The model takes text input (hotel reviews) and outputs a sentiment rating from 1 to 5 stars.
+## Model Details
+- **Model Type**: Random Forest
+- **Task**: Sentiment Analysis
+- **Input**: A hotel review (text)
+- **Output**: Sentiment rating (1-5 stars)
+- **Dataset Used**: TripAdvisor sentiment dataset (balanced labels)
+## Intended Use
+This model is designed to classify hotel reviews based on their sentiment. It assigns a star rating between 1 and 5 to a review, indicating the sentiment expressed in the review.
+## How to Use the Model
+1. **Install the required dependencies**:
+    ```bash
+    pip install joblib
+    ```
+2. **Download and load the model**:
+    You can download the model from Hugging Face and use it to predict sentiment.
+    Example code to download and use the model:
+    ```python
+    from huggingface_hub import hf_hub_download
+    import joblib
+    # Download model from Hugging Face
+    model_path = hf_hub_download(repo_id="your-username/random-forest-model", filename="random_forest_model.joblib")
+    # Load the model
+    model = joblib.load(model_path)
+    # Predict sentiment of a review
+    def predict_sentiment(review):
+        return model.predict([review])[0]
+    review = "This hotel was fantastic. The service was great and the room was clean."
+    print(f"Predicted sentiment: {predict_sentiment(review)}")
+    ```
+3. **The model will return a sentiment rating** between 1 and 5 stars, where:
+   - 1: Very bad
+   - 2: Bad
+   - 3: Neutral
+   - 4: Good
+   - 5: Very good
+## Model Evaluation
+- **Test Accuracy**: 55.28% on the test set.
+- **Classification Report** (Test Set):
+| Label | Precision | Recall | F1-score | Support |
+|-------|-----------|--------|----------|---------|
+| 1.0   | 0.62      | 0.78   | 0.69     | 1600    |
+| 2.0   | 0.48      | 0.38   | 0.42     | 1600    |
+| 3.0   | 0.49      | 0.40   | 0.44     | 1600    |
+| 4.0   | 0.49      | 0.46   | 0.48     | 1600    |
+| 5.0   | 0.63      | 0.74   | 0.68     | 1600    |
+| **Accuracy** | -   | -      | **0.55**  | 8000    |
+| **Macro avg** | 0.54 | 0.55   | 0.54     | 8000    |
+| **Weighted avg** | 0.54 | 0.55 | 0.54     | 8000    |
+- **Cross-validation Scores**:
+* **Random Forest Cross-validation scores**:
+  `[0.54983553, 0.55164474, 0.55805921, 0.55657895, 0.54424342]`
+* **Random Forest Mean Cross-validation score**:
+  `0.5521`
+## Limitations
+- The model performs well on extreme ratings (1 and 5 stars) but struggles with intermediate ratings (2, 3, and 4 stars).
+- The model was trained on the **TripAdvisor** dataset and may not generalize well to reviews from other sources or domains.
+- The model does not handle aspects like sarcasm or humor well, and shorter reviews may lead to less accurate predictions.