Edit model card

distilBERT-Nepali

This model fine-tuned model of raygx/distilBERT-Nepali, revision no.: b35360e0cffb71ae18aaf4ea00ff8369964243a2

It achieves the following results on the evaluation set:

Perplexity:

  • lowest: 17.31
  • average: 19.12z

(This is because training is done in batches of data due to limited resources available)

Loss:

  • loss: 3.2503
  • val_loss: 3.0674

Model description

This model is trained on raygx/Nepali-Extended-Text-Corpus dataset. This dataset is a mixture of cc100 and raygx/Nepali-Text-Corpus. Thus this model is trained on 10 times more data than its previous self. Another change is, the tokenizer is different. Hence, it is a totally different model.

Training procedure

Training is done by running one epoch at once on a batch of data. Thus, training is done for total 6 rounds. So, there were total of 3 batches and 2 epochs.

Training hyperparameters

The following hyperparameters were used during training:

  • optimizer: {'name': 'AdamWeightDecay', 'learning_rate': {'class_name': 'WarmUp', 'config': {'initial_learning_rate': 5e-05, 'decay_schedule_fn': {'class_name': 'PolynomialDecay', 'config': {'initial_learning_rate': 5e-05, 'decay_steps': 16760, 'end_learning_rate': 0.0, 'power': 1.0, 'cycle': False, 'name': None}, 'passive_serialization': True}, 'warmup_steps': 1000, 'power': 1.0, 'name': None}}, 'decay': 0.0, 'beta_1': 0.9, 'beta_2': 0.999, 'epsilon': 1e-08, 'amsgrad': False, 'weight_decay_rate': 0.01}
  • training_precision: mixed_float16

Training results

Perplexity:

  • lowest: 17.31
  • average: 19.12

Loss:

  • loss: 4.8605 - val_loss: 4.0510 - Perplexity: 56.96
  • loss: 3.8504 - val_loss: 3.5142 - Perplexity: 33.65
  • loss: 3.4918 - val_loss: 3.2408 - Perplexity: 25.64
  • loss: 3.2503 - val_loss: 3.0674 - Perplexity: 21.56
  • loss: 3.1324 - val_loss: 2.9243 - Perplexity: 18.49
  • loss: 3.2503 - val_loss: 3.0674 - Perplexity: 17.31

Framework versions

  • Transformers 4.30.2
  • TensorFlow 2.12.0
  • Datasets 2.1.0
  • Tokenizers 0.13.3
Downloads last month
8
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for raygx/distilBERT-Nepali

Finetunes
1 model

Datasets used to train raygx/distilBERT-Nepali

Evaluation results