File size: 1,371 Bytes
298a4e1 5cca410 298a4e1 5cca410 298a4e1 5cca410 298a4e1 5cca410 298a4e1 5cca410 298a4e1 5cca410 298a4e1 5cca410 298a4e1 5cca410 298a4e1 5cca410 298a4e1 5cca410 298a4e1 5cca410 298a4e1 5cca410 298a4e1 5cca410 298a4e1 5cca410 298a4e1 5cca410 298a4e1 5cca410 298a4e1 5cca410 298a4e1 5cca410 298a4e1 5cca410 298a4e1 5cca410 298a4e1 5cca410 298a4e1 5cca410 298a4e1 5cca410 298a4e1 5cca410 298a4e1 5cca410 298a4e1 5cca410 298a4e1 5cca410 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 |
# text_generation_bangla_model
BanglaCLM dataset:
- OSCAR: 12.84GB
- Wikipedia dump: 6.24GB
- ProthomAlo: 3.92GB
- Kalerkantho: 3.24GB
## Model description
- context size : 128
## Training and evaluation data
The BanglaCLM data set is divided into a training set (90%)and a validation set (10%).
## Training procedure
### Training hyperparameters
The following hyperparameters were used during training:
- Batch size: 32
- Initial learning rate: 5e-5
- Number of warmup steps: 10000
- Weight decay rate: 0.01
- Tokenization algorithm: BPE
- Vocabulary size of tokenizer: 50256
- Total trainable params: 124,439,808
- Epochs: 40
- Number of training steps: 40772228
- training_precision: float32
### Training results
perplexity score: 2.86.
### Framework versions
- Transformers 4.26.1
- TensorFlow 2.11.0
- Datasets 2.10.0
- Tokenizers 0.13.2
### Citation
If you find this model helpful, please cite.
```
@INPROCEEDINGS{10303383,
author={Salim, Md. Shahidul and Murad, Hasan and Das, Dola and Ahmed, Faisal},
booktitle={2023 International Conference on Information and Communication Technology for Sustainable Development (ICICT4SD)},
title={BanglaGPT: A Generative Pretrained Transformer-Based Model for Bangla Language},
year={2023},
volume={},
number={},
pages={56-59},
doi={10.1109/ICICT4SD59951.2023.10303383}}
```
|