Logistic Regression Sentiment Analysis Model

This model is a Logistic Regression classifier trained on the TripAdvisor sentiment analysis dataset. It predicts the sentiment of hotel reviews on a 1-5 star scale. The model takes text input (hotel reviews) and outputs a sentiment rating from 1 to 5 stars.

Model Details

  • Model Type: Logistic Regression
  • Task: Sentiment Analysis
  • Input: A hotel review (text)
  • Output: Sentiment rating (1-5 stars)
  • Trained Dataset: nhull/tripadvisor-split-dataset-v2

Intended Use

This model is designed to classify hotel reviews based on their sentiment. It assigns a star rating between 1 and 5 to a review, indicating the sentiment expressed in the review.


The model will return a sentiment rating between 1 and 5 stars, where:

  • 1: Very bad
  • 2: Bad
  • 3: Neutral
  • 4: Good
  • 5: Very good

Dataset

The dataset used for training, validation, and testing is nhull/tripadvisor-split-dataset-v2. It consists of:

  • Training Set: 30,400 reviews
  • Validation Set: 1,600 reviews
  • Test Set: 8,000 reviews

All splits are balanced across five sentiment labels.


Test Performance

Model predicts too high on average by 0.44.

  • Test Accuracy: 61.05% on the test set.

  • Classification Report:

Label Precision Recall F1-score Support
1.0 0.70 0.73 0.71 1600
2.0 0.52 0.50 0.51 1600
3.0 0.57 0.54 0.55 1600
4.0 0.55 0.54 0.55 1600
5.0 0.71 0.74 0.72 1600
Accuracy - - 0.61 8000
Macro avg 0.61 0.61 0.61 8000
Weighted avg 0.61 0.61 0.61 8000
  • Confusion Matrix:
True \ Predicted 1 2 3 4 5
1 1165 384 41 3 7
2 432 805 315 31 17
3 61 314 857 311 57
4 3 48 264 870 415
5 6 10 32 365 1187

Files Included

  • validation_results_log_regression.csv: Contains correctly classified reviews with their real and predicted labels.

Limitations

  • The model performs well on extreme ratings (1 and 5 stars) but struggles with intermediate ratings (2, 3, and 4 stars).
  • The model was trained on the TripAdvisor dataset and may not generalize well to reviews from other sources or domains.
  • The model does not handle aspects like sarcasm or humor well, and shorter reviews may lead to less accurate predictions.
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Dataset used to train nhull/logistic-regression-model

Space using nhull/logistic-regression-model 1

Collection including nhull/logistic-regression-model