--- license: mit tags: - generated_from_trainer base_model: google-bert/bert-base-uncased model-index: - name: bert-base-aze results: [] --- # aLLMA-Base **Note:** This model is not a fine-tuned version of BERT, we have simply used the same architecture. ### Citation If you use the dataset, please cite the following paper: ```bib @inproceedings{isbarov-etal-2024-open, title = "Open foundation models for {A}zerbaijani language", author = "Isbarov, Jafar and Huseynova, Kavsar and Mammadov, Elvin and Hajili, Mammad and Ataman, Duygu", editor = {Ataman, Duygu and Derin, Mehmet Oguz and Ivanova, Sardana and K{\"o}ksal, Abdullatif and S{\"a}lev{\"a}, Jonne and Zeyrek, Deniz}, booktitle = "Proceedings of the First Workshop on Natural Language Processing for Turkic Languages (SIGTURK 2024)", month = aug, year = "2024", address = "Bangkok, Thailand and Online", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2024.sigturk-1.2", pages = "18--28", abstract = "The emergence of multilingual large language models has enabled the development of language understanding and generation systems in Azerbaijani. However, most of the production-grade systems rely on cloud solutions, such as GPT-4. While there have been several attempts to develop open foundation models for Azerbaijani, these works have not found their way into common use due to a lack of systemic benchmarking. This paper encompasses several lines of work that promote open-source foundation models for Azerbaijani. We introduce (1) a large text corpus for Azerbaijani, (2) a family of encoder-only language models trained on this dataset, (3) labeled datasets for evaluating these models, and (4) extensive evaluation that covers all major open-source models with Azerbaijani support.", } ``` https://arxiv.org/abs/2407.02337 ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 5e-05 - train_batch_size: 22 - seed: 42 - gradient_accumulation_steps: 4 - total_train_batch_size: 88 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_steps: 10000 - num_epochs: 10 - mixed_precision_training: Native AMP ### Framework versions - Transformers 4.37.1 - Pytorch 2.1.2+cu121 - Datasets 2.16.1 - Tokenizers 0.15.1