--- license: cc-by-nc-4.0 base_model: mlabonne/NeuralMonarch-7B tags: - generated_from_trainer - axolotl - mistral - instruct - finetune - chatml - gpt4 - synthetic data - distillation model-index: - name: AlphaMonarch-laser results: [] datasets: - argilla/OpenHermes2.5-dpo-binarized-alpha language: - en library_name: transformers pipeline_tag: text-generation --- # AlphaMonarch-laser ![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/64e380b2e12618b261fa6ba0/62S_ExHO6NKCM3NhPDrds.jpeg) AlphaMonarch-laser is a DPO fine-tuned of [mlabonne/NeuralMonarch-7B](https://huggingface.co/mlabonne/NeuralMonarch-7B/) using the [argilla/OpenHermes2.5-dpo-binarized-alpha](https://huggingface.co/datasets/argilla/OpenHermes2.5-dpo-binarized-alpha) preference dataset but achieves better performance then [mlabonne/AlphaMonarch-7B](https://huggingface.co/mlabonne/AlphaMonarch-7B/) using LaserQLoRA. We have fine-tuned this model only on half of the projections, but have achieved better results as compared to the version released by Maximme Labonne. We have trained this model for 1080 steps. AlphaMonarch-laser is ranking 1 on YALL - [Yet Another LLM Leaderboard](https://huggingface.co/spaces/mlabonne/Yet_Another_LLM_Leaderboard). ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64e380b2e12618b261fa6ba0/Jgxw1FZRx7nNAdSh7nYt1.png) ## 🏆 Evaluation results # Nous Benchmark ### AGIEVAL | Task | Version | Metric | Value | StdErr | |---------------------------------|---------|--------------|--------|--------| | agieval_aqua_rat | 0 | acc | 28.35% | 2.83% | | agieval_aqua_rat | 0 | acc_norm | 26.38% | 2.77% | | agieval_logiqa_en | 0 | acc | 38.25% | 1.91% | | agieval_logiqa_en | 0 | acc_norm | 38.10% | 1.90% | | agieval_lsat_ar | 0 | acc | 23.91% | 2.82% | | agieval_lsat_ar | 0 | acc_norm | 23.48% | 2.80% | | agieval_lsat_lr | 0 | acc | 52.75% | 2.21% | | agieval_lsat_lr | 0 | acc_norm | 53.92% | 2.21% | | agieval_lsat_rc | 0 | acc | 66.91% | 2.87% | | agieval_lsat_rc | 0 | acc_norm | 67.29% | 2.87% | | agieval_sat_en | 0 | acc | 78.64% | 2.86% | | agieval_sat_en | 0 | acc_norm | 78.64% | 2.86% | | agieval_sat_en_without_passage | 0 | acc | 45.15% | 3.48% | | agieval_sat_en_without_passage | 0 | acc_norm | 44.17% | 3.47% | | agieval_sat_math | 0 | acc | 33.18% | 3.18% | | agieval_sat_math | 0 | acc_norm | 31.36% | 3.14% | Average: 28.41% ### GPT4ALL | Task | Version | Metric | Value | StdErr | |--------------|---------|----------|-------|--------| | arc_challenge| 0 | acc | 66.30%| ± 1.38%| | | | acc_norm | 68.26%| ± 1.36%| | arc_easy | 0 | acc | 86.57%| ± 0.70%| | | | acc_norm | 80.81%| ± 0.81%| | boolq | 1 | acc | 87.16%| ± 0.59%| | hellaswag | 0 | acc | 69.60%| ± 0.46%| | | | acc_norm | 87.45%| ± 0.33%| | openbookqa | 0 | acc | 39.20%| ± 2.19%| | | | acc_norm | 49.60%| ± 2.24%| | piqa | 0 | acc | 83.03%| ± 0.88%| | | | acc_norm | 84.87%| ± 0.84%| | winogrande | 0 | acc | 81.06%| ± 1.10%| Average: 76.98% ### TRUTHFUL-QA | Task | Version | Metric | Value | StdErr | |---------------|---------|--------|-------|--------| | truthfulqa_mc | 1 | mc1 | 63.04%| ± 1.69%| | truthfulqa_mc | 1 | mc2 | 78.39%| ± 1.37%| Average: 70.71% ### BIGBENCH | Task | Version | Metric | Value | StdErr | |------------------------------------------------|---------|-----------------------|-------|--------------------| | bigbench_causal_judgement | 0 | multiple_choice_grade| 60.00%| ± 3.56% | | bigbench_date_understanding | 0 | multiple_choice_grade| 62.06%| ± 2.53% | | bigbench_disambiguation_qa | 0 | multiple_choice_grade| 54.26%| ± 3.11% | | bigbench_geometric_shapes | 0 | multiple_choice_grade| 23.96%| ± 2.26% | | | | exact_str_match | 0.00% | ± 0.00% | | bigbench_logical_deduction_five_objects | 0 | multiple_choice_grade| 32.80%| ± 2.10% | | bigbench_logical_deduction_seven_objects | 0 | multiple_choice_grade| 23.86%| ± 1.61% | | bigbench_logical_deduction_three_objects | 0 | multiple_choice_grade| 59.33%| ± 2.84% | | bigbench_movie_recommendation | 0 | multiple_choice_grade| 58.00%| ± 2.21% | | bigbench_navigate | 0 | multiple_choice_grade| 56.00%| ± 1.57% | | bigbench_reasoning_about_colored_objects | 0 | multiple_choice_grade| 69.20%| ± 1.03% | | bigbench_ruin_names | 0 | multiple_choice_grade| 55.36%| ± 2.35% | | bigbench_salient_translation_error_detection | 0 | multiple_choice_grade| 41.48%| ± 1.56% | | bigbench_snarks | 0 | multiple_choice_grade| 73.48%| ± 3.29% | | bigbench_sports_understanding | 0 | multiple_choice_grade| 76.06%| ± 1.36% | | bigbench_temporal_sequences | 0 | multiple_choice_grade| 55.50%| ± 1.57% | | bigbench_tracking_shuffled_objects_five_objects| 0 | multiple_choice_grade| 23.28%| ± 1.20% | | bigbench_tracking_shuffled_objects_seven_objects| 0 | multiple_choice_grade| 19.37%| ± 0.94% | | bigbench_tracking_shuffled_objects_three_objects| 0 | multiple_choice_grade| 59.33%| ± 2.84% | Average: 55.37% # Openllm Benchmark | Task |Version| Metric |Value| |Stderr| |-------------|------:|--------|----:|---|-----:| |arc_challenge| 0|acc |70.12|± | 1.30| | | |acc_norm|73.27|± | 1.29| |hellaswag | 0|acc |71.80|± | 0.44| | | |acc_norm|89.20|± | 0.30| |gsm8k | 0|acc |66.77|± | 1.2 | |winogrande | 0|acc |84.6 |± | 1.0 | Average: 73.5% ### TruthfulQA | Task |Version|Metric|Value| |Stderr| |-------------|------:|------|----:|---|-----:| |truthfulqa_mc| 1|mc1 |62.79|± | 1.69| | | |mc2 |77.90|± | 1.37| ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 5e-07 - train_batch_size: 1 - eval_batch_size: 8 - seed: 42 - gradient_accumulation_steps: 8 - total_train_batch_size: 8 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_steps: 100 - training_steps: 1080 ### 📝 Axolotl Configuration ```yaml base_model: mlabonne/NeuralMonarch-7B model_type: MistralForCausalLM tokenizer_type: LlamaTokenizer is_mistral_derived_model: true load_in_8bit: false load_in_4bit: true strict: false rl: dpo chat_template: chatml datasets: - path: mlabonne/chatml-OpenHermes2.5-dpo-binarized-alpha split: train type: chatml.intel dataset_prepared_path: val_set_size: 0.01 output_dir: ./out adapter: qlora lora_model_dir: sequence_len: 1800 sample_packing: false pad_to_sequence_len: false lora_r: 16 lora_alpha: 16 lora_dropout: 0.05 lora_target_linear: true lora_fan_in_fan_out: lora_target_modules: - layers.1.self_attn.q_proj - layers.0.self_attn.q_proj - layers.15.self_attn.q_proj - layers.12.self_attn.q_proj - layers.11.self_attn.q_proj - layers.14.self_attn.q_proj - layers.9.self_attn.q_proj - layers.16.self_attn.q_proj - layers.30.self_attn.q_proj - layers.18.self_attn.q_proj - layers.13.self_attn.q_proj - layers.10.self_attn.q_proj - layers.7.self_attn.q_proj - layers.8.self_attn.q_proj - layers.4.self_attn.q_proj - layers.19.self_attn.q_proj - layers.27.self_attn.k_proj - layers.24.self_attn.k_proj - layers.25.self_attn.k_proj - layers.22.self_attn.k_proj - layers.26.self_attn.k_proj - layers.29.self_attn.k_proj - layers.23.self_attn.k_proj - layers.28.self_attn.k_proj - layers.21.self_attn.k_proj - layers.31.self_attn.k_proj - layers.30.self_attn.k_proj - layers.20.self_attn.k_proj - layers.5.self_attn.k_proj - layers.19.self_attn.k_proj - layers.17.self_attn.k_proj - layers.18.self_attn.k_proj - layers.19.self_attn.v_proj - layers.24.self_attn.v_proj - layers.18.self_attn.v_proj - layers.5.self_attn.v_proj - layers.3.self_attn.v_proj - layers.16.self_attn.v_proj - layers.23.self_attn.v_proj - layers.27.self_attn.v_proj - layers.25.self_attn.v_proj - layers.26.self_attn.v_proj - layers.20.self_attn.v_proj - layers.6.self_attn.v_proj - layers.15.self_attn.v_proj - layers.17.self_attn.v_proj - layers.29.self_attn.v_proj - layers.22.self_attn.v_proj - layers.12.self_attn.o_proj - layers.9.self_attn.o_proj - layers.14.self_attn.o_proj - layers.0.self_attn.o_proj - layers.6.self_attn.o_proj - layers.8.self_attn.o_proj - layers.10.self_attn.o_proj - layers.11.self_attn.o_proj - layers.13.self_attn.o_proj - layers.24.self_attn.o_proj - layers.7.self_attn.o_proj - layers.15.self_attn.o_proj - layers.5.self_attn.o_proj - layers.17.self_attn.o_proj - layers.25.self_attn.o_proj - layers.4.self_attn.o_proj - layers.31.mlp.gate_proj - layers.30.mlp.gate_proj - layers.4.mlp.gate_proj - layers.3.mlp.gate_proj - layers.29.mlp.gate_proj - layers.28.mlp.gate_proj - layers.6.mlp.gate_proj - layers.27.mlp.gate_proj - layers.5.mlp.gate_proj - layers.26.mlp.gate_proj - layers.25.mlp.gate_proj - layers.7.mlp.gate_proj - layers.2.mlp.gate_proj - layers.24.mlp.gate_proj - layers.23.mlp.gate_proj - layers.10.mlp.gate_proj - layers.6.mlp.up_proj - layers.4.mlp.up_proj - layers.5.mlp.up_proj - layers.27.mlp.up_proj - layers.25.mlp.up_proj - layers.26.mlp.up_proj - layers.17.mlp.up_proj - layers.24.mlp.up_proj - layers.7.mlp.up_proj - layers.10.mlp.up_proj - layers.3.mlp.up_proj - layers.11.mlp.up_proj - layers.23.mlp.up_proj - layers.9.mlp.up_proj - layers.14.mlp.up_proj - layers.18.mlp.up_proj - layers.19.mlp.down_proj - layers.20.mlp.down_proj - layers.18.mlp.down_proj - layers.21.mlp.down_proj - layers.29.mlp.down_proj - layers.1.mlp.down_proj - layers.22.mlp.down_proj - layers.28.mlp.down_proj - layers.23.mlp.down_proj - layers.30.mlp.down_proj - layers.17.mlp.down_proj - layers.4.mlp.down_proj - layers.2.mlp.down_proj - layers.15.mlp.down_proj - layers.5.mlp.down_proj wandb_project: axolotl wandb_entity: wandb_watch: wandb_name: wandb_log_model: gradient_accumulation_steps: 8 micro_batch_size: 1 num_epochs: 1 optimizer: paged_adamw_32bit lr_scheduler: cosine learning_rate: 5e-7 train_on_inputs: false group_by_length: false bf16: true fp16: false tf32: true gradient_checkpointing: true early_stopping_patience: resume_from_checkpoint: local_rank: logging_steps: 1 xformers_attention: flash_attention: true warmup_steps: 100 evals_per_epoch: 1 eval_table_size: eval_table_max_new_tokens: 128 save_steps: 1080 max_steps: 1080 debug: deepspeed: weight_decay: 0.0 fsdp: fsdp_config: special_tokens: ``` ### Framework versions - Transformers 4.38.0.dev0 - Pytorch 2.1.2+cu118 - Datasets 2.17.0 - Tokenizers 0.15.0 - axolotl: 0.4.0 [Built with Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl)