File size: 1,371 Bytes

298a4e1
5cca410
 
298a4e1
5cca410
298a4e1
5cca410
298a4e1
5cca410
298a4e1
5cca410
298a4e1
 
5cca410
298a4e1
5cca410
298a4e1
 
5cca410
 
298a4e1
 
5cca410
298a4e1
5cca410
298a4e1
5cca410
298a4e1
5cca410
298a4e1
5cca410
298a4e1
5cca410
298a4e1
5cca410
298a4e1
5cca410
298a4e1
5cca410
298a4e1
5cca410
298a4e1
5cca410
298a4e1
5cca410
298a4e1
5cca410
298a4e1
 
5cca410
298a4e1
5cca410
298a4e1
 
5cca410
298a4e1
5cca410
 
 
 
298a4e1
5cca410
 
 
 
 
 
 
 
 
 
 
 
298a4e1
5cca410


# text_generation_bangla_model
BanglaCLM dataset: 

- OSCAR: 12.84GB

- Wikipedia dump: 6.24GB

- ProthomAlo: 3.92GB

- Kalerkantho: 3.24GB


## Model description

- context size : 128


## Training and evaluation data
The BanglaCLM data set is divided into a training set (90%)and a validation set (10%).


## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:

- Batch size: 32

- Initial learning rate: 5e-5

- Number of warmup steps: 10000

- Weight decay rate: 0.01

- Tokenization algorithm: BPE

- Vocabulary size of tokenizer: 50256

- Total trainable params: 124,439,808

- Epochs: 40

- Number of training steps: 40772228

- training_precision: float32


### Training results

perplexity score: 2.86.


### Framework versions

- Transformers 4.26.1
- TensorFlow 2.11.0
- Datasets 2.10.0
- Tokenizers 0.13.2

### Citation
If you find this model helpful, please cite.
```
@INPROCEEDINGS{10303383,
  author={Salim, Md. Shahidul and Murad, Hasan and Das, Dola and Ahmed, Faisal},
  booktitle={2023 International Conference on Information and Communication Technology for Sustainable Development (ICICT4SD)}, 
  title={BanglaGPT: A Generative Pretrained Transformer-Based Model for Bangla Language}, 
  year={2023},
  volume={},
  number={},
  pages={56-59},
  doi={10.1109/ICICT4SD59951.2023.10303383}}

```