sagorsarker
commited on
Commit
•
b445fdf
1
Parent(s):
1318029
Update README.md
Browse files
README.md
CHANGED
@@ -54,7 +54,7 @@ print(response)
|
|
54 |
|
55 |
## Hardware and Software
|
56 |
|
57 |
-
**Training Factors:** We used the [llama-factory]() training library, a cloud GPU cluster, and production infrastructure for pretraining. Fine-tuning, annotation, and evaluation were also performed on cloud infrastructure.
|
58 |
|
59 |
|
60 |
## Training Data
|
@@ -96,7 +96,7 @@ We evaluated the models on the following datasets:
|
|
96 |
|
97 |
### Evaluation Results
|
98 |
|
99 |
-
#### Evaluation
|
100 |
- **gemma-2-2b** shows stronger performance in **Bangla MMLU** and **BoolQ BN** in the 0-shot setting.
|
101 |
- **titulm-gemma-2-2b-v1.1** performs better in **Commonsense QA BN**, **OpenBook QA BN**, and **PIQA BN** across both 0-shot and 5-shot settings.
|
102 |
- In the 5-shot setting, **titulm-gemma-2-2b-v1.1** leads in **BoolQ BN**, **Commonsense QA BN**, and **OpenBook QA BN**.
|
@@ -109,9 +109,10 @@ We evaluated the models on the following datasets:
|
|
109 |
| titulm-gemma-2-2b-v1.1 | 0-shot | 0.30 | 0.61 | **0.31** | **0.35** | **0.62**|
|
110 |
| | 5-shot | 0.35 | **0.57** | **0.40** | **0.38** | 0.60 |
|
111 |
|
112 |
-
#### Evaluation
|
113 |
- **gemma-2-2b** consistently achieves the highest scores across all tasks in both 0-shot and 5-shot settings, leading in **MMLU**, **BoolQ**, **Commonsense QA**, **OpenBook QA**, and **PIQA**, with a maximum 5-shot score of **0.80** in **PIQA**.
|
114 |
- **titulm-gemma-2-2b-v1.1** performs well but trails behind **gemma-2-2b**, particularly in **Commonsense QA** and **OpenBook QA**, with the best scores being slightly lower across all tasks.
|
|
|
115 |
|
116 |
| Model | Shots | MMLU | BoolQ | Commonsense QA | OpenBook QA | PIQA |
|
117 |
|--------------------------------------|--------|--------------|------------|--------------------|-----------------|-----------|
|
|
|
54 |
|
55 |
## Hardware and Software
|
56 |
|
57 |
+
**Training Factors:** We used the [llama-factory](https://github.com/hiyouga/LLaMA-Factory) training library, a cloud GPU cluster, and production infrastructure for pretraining. Fine-tuning, annotation, and evaluation were also performed on cloud infrastructure.
|
58 |
|
59 |
|
60 |
## Training Data
|
|
|
96 |
|
97 |
### Evaluation Results
|
98 |
|
99 |
+
#### Evaluation of Bangla Benchmark datasets
|
100 |
- **gemma-2-2b** shows stronger performance in **Bangla MMLU** and **BoolQ BN** in the 0-shot setting.
|
101 |
- **titulm-gemma-2-2b-v1.1** performs better in **Commonsense QA BN**, **OpenBook QA BN**, and **PIQA BN** across both 0-shot and 5-shot settings.
|
102 |
- In the 5-shot setting, **titulm-gemma-2-2b-v1.1** leads in **BoolQ BN**, **Commonsense QA BN**, and **OpenBook QA BN**.
|
|
|
109 |
| titulm-gemma-2-2b-v1.1 | 0-shot | 0.30 | 0.61 | **0.31** | **0.35** | **0.62**|
|
110 |
| | 5-shot | 0.35 | **0.57** | **0.40** | **0.38** | 0.60 |
|
111 |
|
112 |
+
#### Evaluation of English Benchmark datasets
|
113 |
- **gemma-2-2b** consistently achieves the highest scores across all tasks in both 0-shot and 5-shot settings, leading in **MMLU**, **BoolQ**, **Commonsense QA**, **OpenBook QA**, and **PIQA**, with a maximum 5-shot score of **0.80** in **PIQA**.
|
114 |
- **titulm-gemma-2-2b-v1.1** performs well but trails behind **gemma-2-2b**, particularly in **Commonsense QA** and **OpenBook QA**, with the best scores being slightly lower across all tasks.
|
115 |
+
- It is expected as we have trained only on Bangla text.
|
116 |
|
117 |
| Model | Shots | MMLU | BoolQ | Commonsense QA | OpenBook QA | PIQA |
|
118 |
|--------------------------------------|--------|--------------|------------|--------------------|-----------------|-----------|
|