distilBERT-Nepali / README.md
raygx's picture
Update README.md
7c37abb
|
raw
history blame
2.78 kB
---
tags:
- generated_from_keras_callback
model-index:
- name: distilBERT-Nepali
results:
- task:
type: Nepali-Language-Modelling
name: Masked Language Modelling
dataset:
type: raygx/Nepali-Extended-Text-Corpus
name: Nepali Language Corpus
metrics:
- type: PPL
value: 17.31
name: Perplexity
datasets:
- raygx/Nepali-Extended-Text-Corpus
- cc100
metrics:
- perplexity
language:
- ne
---
<!-- This model card has been generated automatically according to the information Keras had access to. You should
probably proofread and complete it, then remove this comment. -->
# distilBERT-Nepali
This model fine-tuned model of raygx/distilBERT-Nepali, revision no.: b35360e0cffb71ae18aaf4ea00ff8369964243a2
It achieves the following results on the evaluation set:
Perplexity:
> - lowest: 17.31
> - average: 19.12z
(This is because training is done in batches of data due to limited resources available)
Loss:
> - loss: 3.2503
> - val_loss: 3.0674
## Model description
This model is trained on [raygx/Nepali-Extended-Text-Corpus](https://huggingface.co/datasets/raygx/Nepali-Extended-Text-Corpus) dataset.
This dataset is a mixture of cc100 and [raygx/Nepali-Text-Corpus](https://huggingface.co/datasets/raygx/Nepali-Text-Corpus).
Thus this model is trained on 10 times more data than its previous self.
Another change is, the tokenizer is different. Hence, it is a totally different model.
## Training procedure
Training is done by running one epoch at once on a batch of data.
Thus, training is done for total 6 rounds.
So, there were total of 3 batches and 2 epochs.
### Training hyperparameters
The following hyperparameters were used during training:
- optimizer: {'name': 'AdamWeightDecay', 'learning_rate': {'class_name': 'WarmUp', 'config': {'initial_learning_rate': 5e-05, 'decay_schedule_fn': {'class_name': 'PolynomialDecay', 'config': {'initial_learning_rate': 5e-05, 'decay_steps': 16760, 'end_learning_rate': 0.0, 'power': 1.0, 'cycle': False, 'name': None}, '__passive_serialization__': True}, 'warmup_steps': 1000, 'power': 1.0, 'name': None}}, 'decay': 0.0, 'beta_1': 0.9, 'beta_2': 0.999, 'epsilon': 1e-08, 'amsgrad': False, 'weight_decay_rate': 0.01}
- training_precision: mixed_float16
### Training results
Perplexity:
- lowest: 17.31
- average: 19.12
Loss:
- loss: 4.8605 - val_loss: 4.0510 - Perplexity: 56.96
- loss: 3.8504 - val_loss: 3.5142 - Perplexity: 33.65
- loss: 3.4918 - val_loss: 3.2408 - Perplexity: 25.64
- loss: 3.2503 - val_loss: 3.0674 - Perplexity: 21.56
- loss: 3.1324 - val_loss: 2.9243 - Perplexity: 18.49
- loss: 3.2503 - val_loss: 3.0674 - Perplexity: 17.31
### Framework versions
- Transformers 4.30.2
- TensorFlow 2.12.0
- Datasets 2.1.0
- Tokenizers 0.13.3