metadata
license: gemma
library_name: transformers
datasets:
- jondurbin/gutenberg-dpo-v0.1
model-index:
- name: gemma-2-Ifable-9B
results:
- task:
type: text-generation
name: Text Generation
dataset:
name: IFEval (0-Shot)
type: HuggingFaceH4/ifeval
args:
num_few_shot: 0
metrics:
- type: inst_level_strict_acc and prompt_level_strict_acc
value: 29.84
name: strict accuracy
source:
url: >-
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=ifable/gemma-2-Ifable-9B
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: BBH (3-Shot)
type: BBH
args:
num_few_shot: 3
metrics:
- type: acc_norm
value: 41.03
name: normalized accuracy
source:
url: >-
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=ifable/gemma-2-Ifable-9B
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: MATH Lvl 5 (4-Shot)
type: hendrycks/competition_math
args:
num_few_shot: 4
metrics:
- type: exact_match
value: 8.91
name: exact match
source:
url: >-
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=ifable/gemma-2-Ifable-9B
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: GPQA (0-shot)
type: Idavidrein/gpqa
args:
num_few_shot: 0
metrics:
- type: acc_norm
value: 12.19
name: acc_norm
source:
url: >-
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=ifable/gemma-2-Ifable-9B
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: MuSR (0-shot)
type: TAUR-Lab/MuSR
args:
num_few_shot: 0
metrics:
- type: acc_norm
value: 8.52
name: acc_norm
source:
url: >-
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=ifable/gemma-2-Ifable-9B
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: MMLU-PRO (5-shot)
type: TIGER-Lab/MMLU-Pro
config: main
split: test
args:
num_few_shot: 5
metrics:
- type: acc
value: 35.85
name: accuracy
source:
url: >-
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=ifable/gemma-2-Ifable-9B
name: Open LLM Leaderboard
ifable/gemma-2-Ifable-9B
This model ranked first on the Creative Writing Benchmark (https://eqbench.com/creative_writing.html) on September 10, 2024
Training and evaluation data
- Gutenberg: https://huggingface.co/datasets/jondurbin/gutenberg-dpo-v0.1
- Carefully curated proprietary creative writing dataset
Training procedure
Training method: SimPO (GitHub - princeton-nlp/SimPO: SimPO: Simple Preference Optimization with a Reference-Free Reward)
It achieves the following results on the evaluation set:
- Loss: 1.0163
- Rewards/chosen: -21.6822
- Rewards/rejected: -47.8754
- Rewards/accuracies: 0.9167
- Rewards/margins: 26.1931
- Logps/rejected: -4.7875
- Logps/chosen: -2.1682
- Logits/rejected: -17.0475
- Logits/chosen: -12.0041
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 8e-07
- train_batch_size: 1
- eval_batch_size: 1
- seed: 42
- distributed_type: multi-GPU
- num_devices: 8
- gradient_accumulation_steps: 16
- total_train_batch_size: 128
- total_eval_batch_size: 8
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1.0
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | Sft Loss |
---|---|---|---|---|---|---|---|---|---|---|---|---|
1.4444 | 0.9807 | 35 | 1.0163 | -21.6822 | -47.8754 | 0.9167 | 26.1931 | -4.7875 | -2.1682 | -17.0475 | -12.0041 | 0.0184 |
Framework versions
- Transformers 4.43.4
- Pytorch 2.3.0a0+ebedce2
- Datasets 2.20.0
- Tokenizers 0.19.1
We are looking for product manager and operations managers to build applications through our model, and also open for business cooperation, and also AI engineer to join us, contact with : [email protected]
Open LLM Leaderboard Evaluation Results
Detailed results can be found here
Metric | Value |
---|---|
Avg. | 22.73 |
IFEval (0-Shot) | 29.84 |
BBH (3-Shot) | 41.03 |
MATH Lvl 5 (4-Shot) | 8.91 |
GPQA (0-shot) | 12.19 |
MuSR (0-shot) | 8.52 |
MMLU-PRO (5-shot) | 35.85 |