Jim Lai

grimjim

AI & ML interests

Experimenting primarily with 7B-12B parameter text completion models. Not all models are intended for direct use, but aim for research and/or educational purposes.

Recent Activity

updated a model 5 days ago

grimjim/Magnolia-v5a-12B

published a model 5 days ago

grimjim/Magnolia-v5a-12B

posted an update 12 days ago

This recent paper points to an explanation for the unreasonable effectiveness of Frankenmerges: https://huggingface.co/papers/2502.05171 Specifically, the duplication of layers in Frankenmerges serves a purpose similar to what occurs in their recurrent-depth architecture. Successful frankenmerges that operate without additional fine-tuning are able to recover or "heal" from any damage due to abrupt transitions between layer blocks. Operational replicated layer blocks can provide functional benefits grounded in latent reasoning. Frankenmerges can also result in hybrid reasoning, by splicing together the latent reasoning of different models. Back in April 2024, I was able to duplicate a few layers in the Llama 3 8B model, turning it into a 9B model, without harming benchmarks significantly, despite any transition damage. https://huggingface.co/grimjim/llama-3-experiment-v1-9B My informal experimentation suggested that latent reasoning circuits could occupy continguous stacks of 2-4 layers, though the result was highly sensitive to the choice of transition location between layers.

View all activity

Organizations

grimjim's activity

updated a model 5 days ago

grimjim/Magnolia-v5a-12B

Text Generation • Updated 5 days ago • 19 • 1

published a model 5 days ago

grimjim/Magnolia-v5a-12B

Text Generation • Updated 5 days ago • 19 • 1

posted an update 12 days ago

Post

2018

This recent paper points to an explanation for the unreasonable effectiveness of Frankenmerges: Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach (2502.05171)

Specifically, the duplication of layers in Frankenmerges serves a purpose similar to what occurs in their recurrent-depth architecture. Successful frankenmerges that operate without additional fine-tuning are able to recover or "heal" from any damage due to abrupt transitions between layer blocks. Operational replicated layer blocks can provide functional benefits grounded in latent reasoning. Frankenmerges can also result in hybrid reasoning, by splicing together the latent reasoning of different models.

Back in April 2024, I was able to duplicate a few layers in the Llama 3 8B model, turning it into a 9B model, without harming benchmarks significantly, despite any transition damage.
grimjim/llama-3-experiment-v1-9B
My informal experimentation suggested that latent reasoning circuits could occupy continguous stacks of 2-4 layers, though the result was highly sensitive to the choice of transition location between layers.

1 reply

New activity in open-llm-leaderboard/open_llm_leaderboard 12 days ago

Spurious `trust_remote_code=True` objection when submitting a model?

#1100 opened 12 days ago by

grimjim

updated a model 13 days ago

grimjim/Magnolia-v5-12B

Text Generation • Updated 13 days ago • 40

published a model 13 days ago

grimjim/Magnolia-v5-12B

Text Generation • Updated 13 days ago • 40

New activity in grimjim/DeepSauerHuatuoSkywork-R1-o1-Llama-3.1-8B 18 days ago

Adding Evaluation Results

#1 opened 19 days ago by

T145

posted an update 19 days ago

Post

2371

I've made yet another merge of reasoning models with incremental gains on the current Open LLM leaderboard.
open-llm-leaderboard/open_llm_leaderboard

Merging in DeepSeek R1 distillation to Llama 3.1 8B (at 10% task arithmetic weight, using the Llama 3.1 8B base model as the case rather than the instruct model) with a prior best merge resulted in a slightly lower IFEval, but a higher result in every other benchmark save for MMLU-PRO, which went down only marginally. MATH Lvl5 and GPQA went up palpably.
grimjim/DeepSauerHuatuoSkywork-R1-o1-Llama-3.1-8B

This result is currently my best Llama 3.1 8B merge result to date. The actual R1 distillation itself scored quite badly, so this would seem to be another case of unexpected formatting (reflected in IFEval) hurting the evaluation results, obscuring the strength of a model.

It is also possible to use the text generation feature of this model to generate roleplay completions. Based on informal testing, this model's bias toward problem-solving will subtly impact narration.

updated a collection 20 days ago

Highlighted work

Collection

My "greatest hits", sort of • 11 items • Updated 12 days ago • 4

New activity in google/gemma-2-2b-it 26 days ago

SLERP merge example code?

#20 opened 7 months ago by

grimjim

published a model 27 days ago

grimjim/DeepSauerHuatuoSkywork-R1-o1-Llama-3.1-8B

Text Generation • Updated 18 days ago • 59 • 4

updated a model 27 days ago

grimjim/DeepSauerHuatuoSkywork-R1-o1-Llama-3.1-8B

Text Generation • Updated 18 days ago • 59 • 4

updated a collection 27 days ago

Highlighted work

Collection

My "greatest hits", sort of • 11 items • Updated 12 days ago • 4

published a model 27 days ago

grimjim/SauerHuatuoSkywork-o1-Llama-3.1-8B-GGUF

Text Generation • Updated 27 days ago • 271

updated a model 27 days ago

grimjim/SauerHuatuoSkywork-o1-Llama-3.1-8B-GGUF

Text Generation • Updated 27 days ago • 271

posted an update 29 days ago

Post

1885

A recent merge has provided another interesting result on the current Open LLM leaderboard.
open-llm-leaderboard/open_llm_leaderboard

Combining an o1 reasoning merge with VAGOsolutions's Llama-3.1 SauerkrautLM 8B Instruct model resulted in a lower IFEval, but a higher result in every other benchmark. This result is currently my best Llama 3.1 8B merge result to date.
grimjim/SauerHuatuoSkywork-o1-Llama-3.1-8B
The results suggest that defects in output format and/or output parsing may be limiting benchmark performance of various o1 models.

updated a model 29 days ago

grimjim/SauerHuatuoSkywork-o1-Llama-3.1-8B

Text Generation • Updated 29 days ago • 71 • 2

New activity in grimjim/SauerHuatuoSkywork-o1-Llama-3.1-8B 29 days ago

Adding Evaluation Results

#1 opened 29 days ago by

T145

New activity in FreedomIntelligence/HuatuoGPT-o1-8B 29 days ago

Please submit this model to the Open LLM Leaderboard

#1 opened about 2 months ago by

grimjim