--- tags: - generated_from_keras_callback model-index: - name: distilBERT-Nepali results: - task: type: Nepali-Language-Modelling name: Masked Language Modelling dataset: type: raygx/Nepali-Extended-Text-Corpus name: Nepali Language Corpus metrics: - type: PPL value: 17.31 name: Perplexity datasets: - raygx/Nepali-Extended-Text-Corpus - cc100 metrics: - perplexity language: - ne --- # distilBERT-Nepali This model fine-tuned model of raygx/distilBERT-Nepali, revision no.: b35360e0cffb71ae18aaf4ea00ff8369964243a2 It achieves the following results on the evaluation set: Perplexity: > - lowest: 17.31 > - average: 19.12z (This is because training is done in batches of data due to limited resources available) Loss: > - loss: 3.2503 > - val_loss: 3.0674 ## Model description This model is trained on [raygx/Nepali-Extended-Text-Corpus](https://huggingface.co/datasets/raygx/Nepali-Extended-Text-Corpus) dataset. This dataset is a mixture of cc100 and [raygx/Nepali-Text-Corpus](https://huggingface.co/datasets/raygx/Nepali-Text-Corpus). Thus this model is trained on 10 times more data than its previous self. Another change is, the tokenizer is different. Hence, it is a totally different model. ## Training procedure Training is done by running one epoch at once on a batch of data. Thus, training is done for total 6 rounds. So, there were total of 3 batches and 2 epochs. ### Training hyperparameters The following hyperparameters were used during training: - optimizer: {'name': 'AdamWeightDecay', 'learning_rate': {'class_name': 'WarmUp', 'config': {'initial_learning_rate': 5e-05, 'decay_schedule_fn': {'class_name': 'PolynomialDecay', 'config': {'initial_learning_rate': 5e-05, 'decay_steps': 16760, 'end_learning_rate': 0.0, 'power': 1.0, 'cycle': False, 'name': None}, '__passive_serialization__': True}, 'warmup_steps': 1000, 'power': 1.0, 'name': None}}, 'decay': 0.0, 'beta_1': 0.9, 'beta_2': 0.999, 'epsilon': 1e-08, 'amsgrad': False, 'weight_decay_rate': 0.01} - training_precision: mixed_float16 ### Training results Perplexity: - lowest: 17.31 - average: 19.12 Loss: - loss: 4.8605 - val_loss: 4.0510 - Perplexity: 56.96 - loss: 3.8504 - val_loss: 3.5142 - Perplexity: 33.65 - loss: 3.4918 - val_loss: 3.2408 - Perplexity: 25.64 - loss: 3.2503 - val_loss: 3.0674 - Perplexity: 21.56 - loss: 3.1324 - val_loss: 2.9243 - Perplexity: 18.49 - loss: 3.2503 - val_loss: 3.0674 - Perplexity: 17.31 ### Framework versions - Transformers 4.30.2 - TensorFlow 2.12.0 - Datasets 2.1.0 - Tokenizers 0.13.3