--- base_model: - Krystalan/DRT-o1-14B - sometimesanotion/Lamarck-14B-v0.3 - sometimesanotion/LoRA-64-tempesthenno-ppo-ckpt40 - CultriX/Qwen2.5-14B-Hyperionv4 - sometimesanotion/Qwenvergence-14B-v9 - sometimesanotion/Qwenvergence-14B-v9 - sometimesanotion/LoRA-32-tempesthenno-ppo-ckpt40 - sometimesanotion/Lamarck-14B-v0.6 - sometimesanotion/LoRA-64-tempesthenno-ppo-ckpt40 - sthenno/tempesthenno-ppo-ckpt40 - sometimesanotion/Qwenvergence-14B-v3-Prose - sometimesanotion/LoRA-64-tempesthenno-ppo-ckpt40 library_name: transformers tags: - mergekit - merge license: apache-2.0 language: - en --- # Notes For a model_stock merge, this has greatly exceeded my expectations. It beats Lamarck v0.7's average without introducing DeepSeek elements, mostly by scoring high on MATH without giving up much elsewhere. It also shows that the high-scoring Qwen2.5 14B merges are converging near the limits of the architecture. Here is how it benchmarks alongside the models it merges. ![image/png](https://cdn-uploads.huggingface.co/production/uploads/665fef5a4794222f6a2fe605/Vj2f_8kD9GBeWr0SEj9qd.png) ### Merge Method This model was merged using the [Model Stock](https://arxiv.org/abs/2403.19522) merge method using [sometimesanotion/Qwenvergence-14B-v9](https://huggingface.co/sometimesanotion/Qwenvergence-14B-v9) as a base. ### Models Merged The following models were included in the merge: * [Krystalan/DRT-o1-14B](https://huggingface.co/Krystalan/DRT-o1-14B) * [sometimesanotion/Lamarck-14B-v0.3](https://huggingface.co/sometimesanotion/Lamarck-14B-v0.3) + [sometimesanotion/LoRA-64-tempesthenno-ppo-ckpt40](https://huggingface.co/sometimesanotion/LoRA-64-tempesthenno-ppo-ckpt40) * [CultriX/Qwen2.5-14B-Hyperionv4](https://huggingface.co/CultriX/Qwen2.5-14B-Hyperionv4) * [sometimesanotion/Qwenvergence-14B-v9](https://huggingface.co/sometimesanotion/Qwenvergence-14B-v9) + [sometimesanotion/LoRA-32-tempesthenno-ppo-ckpt40](https://huggingface.co/sometimesanotion/LoRA-32-tempesthenno-ppo-ckpt40) * [sometimesanotion/Lamarck-14B-v0.6](https://huggingface.co/sometimesanotion/Lamarck-14B-v0.6) + [sometimesanotion/LoRA-64-tempesthenno-ppo-ckpt40](https://huggingface.co/sometimesanotion/LoRA-64-tempesthenno-ppo-ckpt40) * [sthenno/tempesthenno-ppo-ckpt40](https://huggingface.co/sthenno/tempesthenno-ppo-ckpt40) * [sometimesanotion/Qwenvergence-14B-v3-Prose](https://huggingface.co/sometimesanotion/Qwenvergence-14B-v3-Prose) + [sometimesanotion/LoRA-64-tempesthenno-ppo-ckpt40](https://huggingface.co/sometimesanotion/LoRA-64-tempesthenno-ppo-ckpt40) ### Configuration The following YAML configuration was used to produce this model: ```yaml name: Qwenvergence-14B-v11 merge_method: model_stock base_model: sometimesanotion/Qwenvergence-14B-v9 tokenizer_source: base dtype: bfloat16 out_dtype: bfloat16 parameters: int8_mask: true normalize: true rescale: false models: - model: sometimesanotion/Lamarck-14B-v0.6+sometimesanotion/LoRA-64-tempesthenno-ppo-ckpt40 - model: sometimesanotion/Qwenvergence-14B-v3-Prose+sometimesanotion/LoRA-64-tempesthenno-ppo-ckpt40 - model: sometimesanotion/Qwenvergence-14B-v9+sometimesanotion/LoRA-32-tempesthenno-ppo-ckpt40 - model: sometimesanotion/Lamarck-14B-v0.3+sometimesanotion/LoRA-64-tempesthenno-ppo-ckpt40 - model: sometimesanotion/Lamarck-14B-v0.6+sometimesanotion/LoRA-64-tempesthenno-ppo-ckpt40 - model: CultriX/Qwen2.5-14B-Hyperionv4 - model: Krystalan/DRT-o1-14B - model: sthenno/tempesthenno-ppo-ckpt40 ```