sagorsarker commited on
Commit
b445fdf
1 Parent(s): 1318029

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -3
README.md CHANGED
@@ -54,7 +54,7 @@ print(response)
54
 
55
  ## Hardware and Software
56
 
57
- **Training Factors:** We used the [llama-factory]() training library, a cloud GPU cluster, and production infrastructure for pretraining. Fine-tuning, annotation, and evaluation were also performed on cloud infrastructure.
58
 
59
 
60
  ## Training Data
@@ -96,7 +96,7 @@ We evaluated the models on the following datasets:
96
 
97
  ### Evaluation Results
98
 
99
- #### Evaluation on Bangla Benchmark datasets
100
  - **gemma-2-2b** shows stronger performance in **Bangla MMLU** and **BoolQ BN** in the 0-shot setting.
101
  - **titulm-gemma-2-2b-v1.1** performs better in **Commonsense QA BN**, **OpenBook QA BN**, and **PIQA BN** across both 0-shot and 5-shot settings.
102
  - In the 5-shot setting, **titulm-gemma-2-2b-v1.1** leads in **BoolQ BN**, **Commonsense QA BN**, and **OpenBook QA BN**.
@@ -109,9 +109,10 @@ We evaluated the models on the following datasets:
109
  | titulm-gemma-2-2b-v1.1 | 0-shot | 0.30 | 0.61 | **0.31** | **0.35** | **0.62**|
110
  | | 5-shot | 0.35 | **0.57** | **0.40** | **0.38** | 0.60 |
111
 
112
- #### Evaluation on English Benchmark datasets
113
  - **gemma-2-2b** consistently achieves the highest scores across all tasks in both 0-shot and 5-shot settings, leading in **MMLU**, **BoolQ**, **Commonsense QA**, **OpenBook QA**, and **PIQA**, with a maximum 5-shot score of **0.80** in **PIQA**.
114
  - **titulm-gemma-2-2b-v1.1** performs well but trails behind **gemma-2-2b**, particularly in **Commonsense QA** and **OpenBook QA**, with the best scores being slightly lower across all tasks.
 
115
 
116
  | Model | Shots | MMLU | BoolQ | Commonsense QA | OpenBook QA | PIQA |
117
  |--------------------------------------|--------|--------------|------------|--------------------|-----------------|-----------|
 
54
 
55
  ## Hardware and Software
56
 
57
+ **Training Factors:** We used the [llama-factory](https://github.com/hiyouga/LLaMA-Factory) training library, a cloud GPU cluster, and production infrastructure for pretraining. Fine-tuning, annotation, and evaluation were also performed on cloud infrastructure.
58
 
59
 
60
  ## Training Data
 
96
 
97
  ### Evaluation Results
98
 
99
+ #### Evaluation of Bangla Benchmark datasets
100
  - **gemma-2-2b** shows stronger performance in **Bangla MMLU** and **BoolQ BN** in the 0-shot setting.
101
  - **titulm-gemma-2-2b-v1.1** performs better in **Commonsense QA BN**, **OpenBook QA BN**, and **PIQA BN** across both 0-shot and 5-shot settings.
102
  - In the 5-shot setting, **titulm-gemma-2-2b-v1.1** leads in **BoolQ BN**, **Commonsense QA BN**, and **OpenBook QA BN**.
 
109
  | titulm-gemma-2-2b-v1.1 | 0-shot | 0.30 | 0.61 | **0.31** | **0.35** | **0.62**|
110
  | | 5-shot | 0.35 | **0.57** | **0.40** | **0.38** | 0.60 |
111
 
112
+ #### Evaluation of English Benchmark datasets
113
  - **gemma-2-2b** consistently achieves the highest scores across all tasks in both 0-shot and 5-shot settings, leading in **MMLU**, **BoolQ**, **Commonsense QA**, **OpenBook QA**, and **PIQA**, with a maximum 5-shot score of **0.80** in **PIQA**.
114
  - **titulm-gemma-2-2b-v1.1** performs well but trails behind **gemma-2-2b**, particularly in **Commonsense QA** and **OpenBook QA**, with the best scores being slightly lower across all tasks.
115
+ - It is expected as we have trained only on Bangla text.
116
 
117
  | Model | Shots | MMLU | BoolQ | Commonsense QA | OpenBook QA | PIQA |
118
  |--------------------------------------|--------|--------------|------------|--------------------|-----------------|-----------|