How are its various parameters

#1
by Inschrift-Spruch-Raum - opened

After v11, you released v12, but it doesn't have a benchmark yet. I'm eager to know how strong it is. Can you tell me?

While the leaderboard's evaluations are heavily lagged, I can tell you what I expect of v12.

I share your excitement for v11, because it's clearly showing how to have high reasoning and high IFEVAL out of a model_stock. A lot of work went into that from Qwentinuum, then Vimarckoso - and I really like watching fellow mergers benefit from Lamarck's building blocks like this. Qwenvergence V11 has a major role in the upcoming Lamarck v0.8, both DeepSeek and not.

The output I've gotten is promising of a high reasoning score, strong detail-oriented IFEVAL, and... I am having a blast with the prose.

Qwenvergence is a series of model_stocks with different emphasis; this v12 model is not a direct progression from v11, but a remake of a very early Qwenvergence (https://huggingface.co/sometimesanotion/Qwenvergence-14B-v3-Prose) that surprised me by having a very similar GPQA almost like @Cultrix 's(https://huggingface.co/CultriX/Qwen2.5-14B-Wernicke).

To my knowledge, https://huggingface.co/CultriX/Qwen2.5-14B-Hyperionv4 is the only 14B model to best Qwenvergence v3 Prose's score. Every decimal point of GPQA is very hard-won, unlike IFEVAL. I've hoped to get a synergy between Cultrix's models and my own, but our top scorers must be near optimal, because nearly every merge loses GPQA.

Will this likely improve on v3 Prose in every way apart from GPQA? Almost certainly. What will its GPQA be? Quite frankly, I'm betting 18.9 - 19.2, but I'm as hopeful as you to see more.

name:                Qwenvergence-14B-v3-Prose
merge_method:        model_stock
base_model:          Qwen/Qwen2.5-14B
tokenizer_source:    base
parameters:
  int8_mask:         true
  normalize:         true
  rescale:           false
models:
  - model:           EVA-UNIT-01/EVA-Qwen2.5-14B-v0.2
  - model:           oxyapi/oxy-1-small
  - model:           allura-org/TQ2.5-14B-Sugarquill-v1
  - model:           arcee-ai/Virtuoso-Small
  - model:           v000000/Qwen2.5-Lumen-14B
  - model:           underwoods/medius-erebus-magnum-14b
  - model:           sthenno-com/miscii-14b-1028
  - model:           sthenno-com/miscii-14b-1028
  - model:           huihui-ai/Qwen2.5-14B-Instruct-abliterated-v2
dtype:               bfloat16
out_dtype:           bfloat16
name:                Qwenvergence-14B-v12-Prose-DS
merge_method:        model_stock
base_model:          sometimesanotion/Base-Chocolatine-2-14B-Instruct-v2.0b3
tokenizer_source:    base
dtype:               float32
out_dtype:           bfloat16
parameters:
  int8_mask:         true
  normalize:         true
  rescale:           false
models:
  - model:           EVA-UNIT-01/EVA-Qwen2.5-14B-v0.2
  - model:           oxyapi/oxy-1-small
  - model:           allura-org/TQ2.5-14B-Sugarquill-v1
  - model:           jpacifico/Chocolatine-2-14B-Instruct-v2.0b3
  - model:           sometimesanotion/Qwenvergence-14B-v3-Prose+sometimesanotion/LoRA-64-Chocolatine-2-14B-Instruct-v2.0b3
  - model:           underwoods/medius-erebus-magnum-14b
  - model:           sthenno/tempesthenno-ppo-ckpt40+sometimesanotion/LoRA-64-Chocolatine-2-14B-Instruct-v2.0b3
  - model:           huihui-ai/DeepSeek-R1-Distill-Qwen-14B-abliterated-v2

The much-anticipated benchmarks are in! Sure enough, compared to v3 Prose, this model slips by a hair for BBH and GPQA, but makes large gains elsewhere.

The whole point of Qwenvergence Prose models is to have rich prose and reasoning; the only reason they aren't a larger part of Lamarck is that they can go a bit off the rails for precise technical language and translation. They should be excellent for creative writing, and they are important in specific layer ranges of Lamarck.

@CultriX , @sthenno , I know you have your subjective preferences, but check this one out!
newplot (4).png

Excellent! I can't wait to quantify it

@jpacifico , I want to thank you for your contribution. Merges and LoRAs with your Chocolatine finetune have been scoring high and rendering terrific prose. I am becoming very confident of an outstanding Lamarck v0.8.

Excellent!

Sign up or log in to comment