Edit model card

distilbert_classifier_newsgroup

This model is a fine-tuned version of distilbert-base-uncased on the 20 Newsgroups data set that is a collection of approximately 20,000 newsgroup documents, partitioned (nearly) evenly across 20 different newsgroups. It achieves the following results on the evaluation set: loss: 0.5660 - accuracy: 0.8371

Training and evaluation data

  • The training set contained 10182 rows with features - text and label.
  • The evaluation set contained 7532 rows with features - text and label.

Training procedure

  • Setup the model checkpoint as distilbert-base-uncased
  • Intialize the starting weights with the model checkpoint and give it the number of labels - i.e., 20 in this case.
  • Will be training for 3 epochs
  • Using a batch size of 16

Training hyperparameters

The following hyperparameters were used during training:

  • optimizer: {'name': 'Adam', 'weight_decay': None, 'clipnorm': None, 'global_clipnorm': None, 'clipvalue': None, 'use_ema': False, 'ema_momentum': 0.99, 'ema_overwrite_frequency': None, 'jit_compile': True, 'is_legacy_optimizer': False, 'learning_rate': {'class_name': 'PolynomialDecay', 'config': {'initial_learning_rate': 2e-05, 'decay_steps': 1908, 'end_learning_rate': 0.0, 'power': 1.0, 'cycle': False, 'name': None}}, 'beta_1': 0.9, 'beta_2': 0.999, 'epsilon': 1e-08, 'amsgrad': False}
  • training_precision: float32

Training results

  • Results on the evaluation set: loss: 0.5660 - accuracy: 0.8371

Framework versions

  • Transformers 4.28.0
  • TensorFlow 2.12.0
  • Datasets 2.12.0
  • Tokenizers 0.13.3
Downloads last month
1
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.