--- base_model: - sometimesanotion/Qwenvergence-14B-v3-Prose - sometimesanotion/LoRA-la128 - Krystalan/DRT-o1-14B - huihui-ai/DeepSeek-R1-Distill-Qwen-14B-abliterated - sometimesanotion/Lamarck-14B-v0.7 - sometimesanotion/Lamarck-14B-v0.3 - sometimesanotion/LoRA-la128 - sometimesanotion/Qwenvergence-14B-v9 - sometimesanotion/LoRA-la128 - sometimesanotion/Qwenvergence-14B-v9 library_name: transformers tags: - mergekit - merge license: apache-2.0 language: - en metrics: - accuracy pipeline_tag: text-generation --- # Notes Qwenvergence is a component of the [Lamarck project](https://huggingface.co/sometimesanotion/Lamarck-14B-v0.7), which iteratively merges a model_stock alongside its previous version as a first step to a complex merge strategy. Some of the models have pre-applied LoRAs. In this case, a rank 128 adapter from Lamarck 0.7 was used to prevent sharp regressions in its performance. I attribute this model's record-breaking MATH score of 44.18%, for a 14B model on the Open LLM Leaderboard, to its combination of Krystalan/DRT-o1-14B and huihui-ai/DeepSeek-R1-Distill-Qwen-14B-abliterated. These are strong models individually, but this is an area of synergy when they are merged. # Merge method This model was merged using the [Model Stock](https://arxiv.org/abs/2403.19522) merge method using [sometimesanotion/Qwenvergence-14B-v9](https://huggingface.co/sometimesanotion/Qwenvergence-14B-v9) as a base. ### Models Merged The following models were included in the merge: * [sometimesanotion/Qwenvergence-14B-v3-Prose](https://huggingface.co/sometimesanotion/Qwenvergence-14B-v3-Prose) + [sometimesanotion/LoRA-la128](https://huggingface.co/sometimesanotion/LoRA-la128) * [Krystalan/DRT-o1-14B](https://huggingface.co/Krystalan/DRT-o1-14B) * [huihui-ai/DeepSeek-R1-Distill-Qwen-14B-abliterated](https://huggingface.co/huihui-ai/DeepSeek-R1-Distill-Qwen-14B-abliterated) * [sometimesanotion/Lamarck-14B-v0.7](https://huggingface.co/sometimesanotion/Lamarck-14B-v0.7) * [sometimesanotion/Lamarck-14B-v0.3](https://huggingface.co/sometimesanotion/Lamarck-14B-v0.3) + [sometimesanotion/LoRA-la128](https://huggingface.co/sometimesanotion/LoRA-la128) * [sometimesanotion/Qwenvergence-14B-v9](https://huggingface.co/sometimesanotion/Qwenvergence-14B-v9) + [sometimesanotion/LoRA-la128](https://huggingface.co/sometimesanotion/LoRA-la128) ### Configuration The following YAML configuration was used to produce this model: ```yaml name: Qwenvergence-14B-v10 merge_method: model_stock base_model: sometimesanotion/Qwenvergence-14B-v9 tokenizer_source: base dtype: float32 out_dtype: bfloat16 parameters: int8_mask: true normalize: true rescale: false models: - model: sometimesanotion/Lamarck-14B-v0.7 - model: sometimesanotion/Qwenvergence-14B-v3-Prose+sometimesanotion/LoRA-la128 - model: huihui-ai/DeepSeek-R1-Distill-Qwen-14B-abliterated - model: sometimesanotion/Lamarck-14B-v0.3+sometimesanotion/LoRA-la128 - model: huihui-ai/DeepSeek-R1-Distill-Qwen-14B-abliterated - model: Krystalan/DRT-o1-14B - model: sometimesanotion/Qwenvergence-14B-v9+sometimesanotion/LoRA-la128 - model: sometimesanotion/Qwenvergence-14B-v3-Prose+sometimesanotion/LoRA-la128 ```