---
base_model:
- sometimesanotion/Qwenvergence-14B-v3-Prose
- sometimesanotion/LoRA-la128
- Krystalan/DRT-o1-14B
- huihui-ai/DeepSeek-R1-Distill-Qwen-14B-abliterated
- sometimesanotion/Lamarck-14B-v0.7
- sometimesanotion/Lamarck-14B-v0.3
- sometimesanotion/LoRA-la128
- sometimesanotion/Qwenvergence-14B-v9
- sometimesanotion/LoRA-la128
- sometimesanotion/Qwenvergence-14B-v9
library_name: transformers
tags:
- mergekit
- merge
license: apache-2.0
language:
- en
metrics:
- accuracy
pipeline_tag: text-generation
---
# Notes

Qwenvergence is a component of the [Lamarck project](https://huggingface.co/sometimesanotion/Lamarck-14B-v0.7), which iteratively merges a model_stock alongside its previous version as a first step to a complex merge strategy.

Some of the models have pre-applied LoRAs.  In this case, a rank 128 adapter from Lamarck 0.7 was used to prevent sharp regressions in its performance.

I attribute this model's record-breaking MATH score of 44.18%, for a 14B model on the Open LLM Leaderboard, to its combination of Krystalan/DRT-o1-14B and huihui-ai/DeepSeek-R1-Distill-Qwen-14B-abliterated.  These are strong models individually, but this is an area of synergy when they are merged.

# Merge method

This model was merged using the [Model Stock](https://arxiv.org/abs/2403.19522) merge method using [sometimesanotion/Qwenvergence-14B-v9](https://huggingface.co/sometimesanotion/Qwenvergence-14B-v9) as a base.

### Models Merged

The following models were included in the merge:
* [sometimesanotion/Qwenvergence-14B-v3-Prose](https://huggingface.co/sometimesanotion/Qwenvergence-14B-v3-Prose) + [sometimesanotion/LoRA-la128](https://huggingface.co/sometimesanotion/LoRA-la128)
* [Krystalan/DRT-o1-14B](https://huggingface.co/Krystalan/DRT-o1-14B)
* [huihui-ai/DeepSeek-R1-Distill-Qwen-14B-abliterated](https://huggingface.co/huihui-ai/DeepSeek-R1-Distill-Qwen-14B-abliterated)
* [sometimesanotion/Lamarck-14B-v0.7](https://huggingface.co/sometimesanotion/Lamarck-14B-v0.7)
* [sometimesanotion/Lamarck-14B-v0.3](https://huggingface.co/sometimesanotion/Lamarck-14B-v0.3) + [sometimesanotion/LoRA-la128](https://huggingface.co/sometimesanotion/LoRA-la128)
* [sometimesanotion/Qwenvergence-14B-v9](https://huggingface.co/sometimesanotion/Qwenvergence-14B-v9) + [sometimesanotion/LoRA-la128](https://huggingface.co/sometimesanotion/LoRA-la128)

### Configuration

The following YAML configuration was used to produce this model:

```yaml
name:                Qwenvergence-14B-v10
merge_method:        model_stock
base_model:          sometimesanotion/Qwenvergence-14B-v9
tokenizer_source:    base
dtype:               float32
out_dtype:           bfloat16
parameters:
  int8_mask:         true
  normalize:         true
  rescale:           false
models:
  - model:           sometimesanotion/Lamarck-14B-v0.7
  - model:           sometimesanotion/Qwenvergence-14B-v3-Prose+sometimesanotion/LoRA-la128
  - model:           huihui-ai/DeepSeek-R1-Distill-Qwen-14B-abliterated
  - model:           sometimesanotion/Lamarck-14B-v0.3+sometimesanotion/LoRA-la128
  - model:           huihui-ai/DeepSeek-R1-Distill-Qwen-14B-abliterated
  - model:           Krystalan/DRT-o1-14B
  - model:           sometimesanotion/Qwenvergence-14B-v9+sometimesanotion/LoRA-la128
  - model:           sometimesanotion/Qwenvergence-14B-v3-Prose+sometimesanotion/LoRA-la128
```