|
--- |
|
tags: |
|
- generated_from_keras_callback |
|
model-index: |
|
- name: distilBERT-Nepali |
|
results: |
|
- task: |
|
type: Nepali-Language-Modelling |
|
name: Masked Language Modelling |
|
dataset: |
|
type: raygx/Nepali-Extended-Text-Corpus |
|
name: Nepali Language Corpus |
|
metrics: |
|
- type: PPL |
|
value: 17.31 |
|
name: Perplexity |
|
datasets: |
|
- raygx/Nepali-Extended-Text-Corpus |
|
- cc100 |
|
metrics: |
|
- perplexity |
|
language: |
|
- ne |
|
--- |
|
|
|
<!-- This model card has been generated automatically according to the information Keras had access to. You should |
|
probably proofread and complete it, then remove this comment. --> |
|
|
|
# distilBERT-Nepali |
|
|
|
This model fine-tuned model of raygx/distilBERT-Nepali, revision no.: b35360e0cffb71ae18aaf4ea00ff8369964243a2 |
|
|
|
It achieves the following results on the evaluation set: |
|
|
|
Perplexity: |
|
> - lowest: 17.31 |
|
> - average: 19.12z |
|
|
|
(This is because training is done in batches of data due to limited resources available) |
|
|
|
Loss: |
|
> - loss: 3.2503 |
|
> - val_loss: 3.0674 |
|
|
|
## Model description |
|
|
|
This model is trained on [raygx/Nepali-Extended-Text-Corpus](https://huggingface.co/datasets/raygx/Nepali-Extended-Text-Corpus) dataset. |
|
This dataset is a mixture of cc100 and [raygx/Nepali-Text-Corpus](https://huggingface.co/datasets/raygx/Nepali-Text-Corpus). |
|
Thus this model is trained on 10 times more data than its previous self. |
|
Another change is, the tokenizer is different. Hence, it is a totally different model. |
|
|
|
## Training procedure |
|
|
|
Training is done by running one epoch at once on a batch of data. |
|
Thus, training is done for total 6 rounds. |
|
So, there were total of 3 batches and 2 epochs. |
|
|
|
### Training hyperparameters |
|
|
|
The following hyperparameters were used during training: |
|
- optimizer: {'name': 'AdamWeightDecay', 'learning_rate': {'class_name': 'WarmUp', 'config': {'initial_learning_rate': 5e-05, 'decay_schedule_fn': {'class_name': 'PolynomialDecay', 'config': {'initial_learning_rate': 5e-05, 'decay_steps': 16760, 'end_learning_rate': 0.0, 'power': 1.0, 'cycle': False, 'name': None}, '__passive_serialization__': True}, 'warmup_steps': 1000, 'power': 1.0, 'name': None}}, 'decay': 0.0, 'beta_1': 0.9, 'beta_2': 0.999, 'epsilon': 1e-08, 'amsgrad': False, 'weight_decay_rate': 0.01} |
|
- training_precision: mixed_float16 |
|
|
|
### Training results |
|
|
|
Perplexity: |
|
- lowest: 17.31 |
|
- average: 19.12 |
|
|
|
Loss: |
|
- loss: 4.8605 - val_loss: 4.0510 - Perplexity: 56.96 |
|
- loss: 3.8504 - val_loss: 3.5142 - Perplexity: 33.65 |
|
- loss: 3.4918 - val_loss: 3.2408 - Perplexity: 25.64 |
|
- loss: 3.2503 - val_loss: 3.0674 - Perplexity: 21.56 |
|
- loss: 3.1324 - val_loss: 2.9243 - Perplexity: 18.49 |
|
- loss: 3.2503 - val_loss: 3.0674 - Perplexity: 17.31 |
|
|
|
### Framework versions |
|
|
|
- Transformers 4.30.2 |
|
- TensorFlow 2.12.0 |
|
- Datasets 2.1.0 |
|
- Tokenizers 0.13.3 |