raygx
/

distilBERT-Nepali

generated_from_keras_callback

Inference Endpoints

Model card Files Files and versions Community

distilBERT-Nepali / README.md

raygx's picture

Update README.md

7c37abb over 1 year ago

|

2.78 kB

	---
	tags:
	- generated_from_keras_callback
	model-index:
	- name: distilBERT-Nepali
	results:
	- task:
	type: Nepali-Language-Modelling
	name: Masked Language Modelling
	dataset:
	type: raygx/Nepali-Extended-Text-Corpus
	name: Nepali Language Corpus
	metrics:
	- type: PPL
	value: 17.31
	name: Perplexity
	datasets:
	- raygx/Nepali-Extended-Text-Corpus
	- cc100
	metrics:
	- perplexity
	language:
	- ne
	---

	<!-- This model card has been generated automatically according to the information Keras had access to. You should
	probably proofread and complete it, then remove this comment. -->

	# distilBERT-Nepali

	This model fine-tuned model of raygx/distilBERT-Nepali, revision no.: b35360e0cffb71ae18aaf4ea00ff8369964243a2

	It achieves the following results on the evaluation set:

	Perplexity:
	> - lowest: 17.31
	> - average: 19.12z

	(This is because training is done in batches of data due to limited resources available)

	Loss:
	> - loss: 3.2503
	> - val_loss: 3.0674

	## Model description

	This model is trained on [raygx/Nepali-Extended-Text-Corpus](https://huggingface.co/datasets/raygx/Nepali-Extended-Text-Corpus) dataset.
	This dataset is a mixture of cc100 and [raygx/Nepali-Text-Corpus](https://huggingface.co/datasets/raygx/Nepali-Text-Corpus).
	Thus this model is trained on 10 times more data than its previous self.
	Another change is, the tokenizer is different. Hence, it is a totally different model.

	## Training procedure

	Training is done by running one epoch at once on a batch of data.
	Thus, training is done for total 6 rounds.
	So, there were total of 3 batches and 2 epochs.

	### Training hyperparameters

	The following hyperparameters were used during training:
	- optimizer: {'name': 'AdamWeightDecay', 'learning_rate': {'class_name': 'WarmUp', 'config': {'initial_learning_rate': 5e-05, 'decay_schedule_fn': {'class_name': 'PolynomialDecay', 'config': {'initial_learning_rate': 5e-05, 'decay_steps': 16760, 'end_learning_rate': 0.0, 'power': 1.0, 'cycle': False, 'name': None}, '__passive_serialization__': True}, 'warmup_steps': 1000, 'power': 1.0, 'name': None}}, 'decay': 0.0, 'beta_1': 0.9, 'beta_2': 0.999, 'epsilon': 1e-08, 'amsgrad': False, 'weight_decay_rate': 0.01}
	- training_precision: mixed_float16

	### Training results

	Perplexity:
	- lowest: 17.31
	- average: 19.12

	Loss:
	- loss: 4.8605 - val_loss: 4.0510 - Perplexity: 56.96
	- loss: 3.8504 - val_loss: 3.5142 - Perplexity: 33.65
	- loss: 3.4918 - val_loss: 3.2408 - Perplexity: 25.64
	- loss: 3.2503 - val_loss: 3.0674 - Perplexity: 21.56
	- loss: 3.1324 - val_loss: 2.9243 - Perplexity: 18.49
	- loss: 3.2503 - val_loss: 3.0674 - Perplexity: 17.31

	### Framework versions

	- Transformers 4.30.2
	- TensorFlow 2.12.0
	- Datasets 2.1.0
	- Tokenizers 0.13.3