lihuicham's picture
Update example for inference
d5569b9 verified
metadata
tags:
  - reviews
  - multi-class
  - classifier
  - text classification
  - roberta-base
widget:
  - text: >-
      This was my first time getting an Airbnb and won’t be the last! The
      location was so peaceful and quiet, perfect for a weekend getaway. The
      space was modern and clean. I was able to cook a whole breakfast buffet in
      the kitchen. The hosts were extremely helpful and friendly, 10/10 highly
      recommend! Definitely will be returning when the weather gets warmer!!
  - text: >-
      We went for a weekend to be out in nature with our kids and a friend. The
      house is very cute inside and decorated nicely BUT the property photos
      leave out a house right next-door, so not private, a messy yard area w
      broken down sheds and construction, a gun range close by so all we could
      hear was gunshots all day, the kitchen cabinets esp the pantry were dirty
      and filled w junk and the hot tub was foggy, dirty and they must have just
      dumped a lot of bleach in rather than balancing the chemicals and cleaning
      it properly because everyone got rashes/eye irritation/headaches and had
      to get out and shower. The house really only sleeps five and you are stuck
      scrounging for pillows blankets and sheets and blowing up an aero bed for
      anyone else. The first one had a leak so we had to find a second and do it
      all again. We could not find a trundle bed. I really wanted to like it as
      cute as the pictures are but the real thing leaves a lot to be desired.
  - text: Was quiet and nice

Jupyter Notebooks

GitHub link : lihuicham/airbnb-helpfulness-classifier

Fine-tuning Python code in finetuning.ipynb

Team Members (S001 - Synthetic Expert Team E) :

Li Hui Cham, Isaac Sparrow, Christopher Arraya, Nicholas Wong, Lei Zhang, Leonard Yang

Description

This model is an AirBnB reviews helpfulness classifier. It can predict the helpfulness, from most helpful (A) to least helpful (C) of the reviews on AirBnB website.

Pre-trained LLM

Our project fine-tuned FacebookAI/roberta-base for multi-class text (sequence) classification.

Dataset

5000 samples are scraped from AirBnB website based on listing_id from this Kaggle AirBnB Listings & Reviews dataset.Samples were translated from French to English language.

Training Set : 4560 samples synthetically labelled by GPT-4 Turbo. Cost was approximately $60.

Test/Evaluation Set : 500 samples labelled manually by two groups (each group labelled 250 samples), majority votes applies. A scoring rubrics (shown below) is used for labelling.

Training Details

hyperparameters =  {'learning_rate': 3e-05,
                    'per_device_train_batch_size': 16,
                    'weight_decay': 1e-04,
                    'num_train_epochs': 4,
                    'warmup_steps': 500}

We trained our model on Colab Pro which costed us approximately 56 computing units.

Slides

image/png

image/png

image/png

image/png

image/png