Code Directional Enhancement for Language Models: A Novel Approach to Specialization without Fine-Tuning

"Even though My experiments and ideas may seem unconventional, wouldn't it be significant if they proved to be effective?

After all, nothing starts out perfect.

The vast realm of AI is like a great wall—while we may not be able to completely cross it, isn't simply climbing up and seeing beyond it still a step forward?

What I am doing now is an attempt to provide a path that allows us to look beyond that wall.

May divine blessings and great wealth be upon all AI researchers who dedicate themselves to exploring these frontiers and pushing the boundaries of the unknown."

This Model by "AI JOAH"

Overview

This model is made by muzerai aka "AI JOAH" using deepseek-ai/DeepSeek-R1-Distill-Llama-8B (test purpose).

Subscribe to my YouTube Channel AI JOAH

This project presents a methodology for enhancing specific capabilities of language models using the Directional Enhancement technique. This approach does not introduce new knowledge into the model but amplifies its existing latent abilities. While preserving the general capabilities of the language model, it significantly improves performance in specific domains such as coding reasoning.

This is a speculative code reasoning enhancement version of deepseek-ai/DeepSeek-R1-Distill-Llama-8B.

If enhance.txt is changed for a different domain, this model style can be adapted accordingly. This test uses 500 instructions for specialization in code reasoning.

datasets reference for 500 samples (only instructions): jtatman/python-code-dataset-500k.

Technical Background

Principle of Directional Enhancement

This approach identifies a specialization direction in the representation space of the language model, associated with a specific capability, and enhances the model’s attention weights in that direction.

Compute the difference in representation between specialized prompts (domain-specific) and general prompts within the model's hidden states.
Normalize this difference vector to obtain the specialization direction.
Enhance the model’s self-attention output projection weights (o_proj) along this specialized direction.

This method strengthens the model’s intrinsic abilities rather than introducing completely new knowledge or patterns. It functions similarly to how a lens amplifies a specific wavelength of light.

Computing Specialization Direction

Unlike conventional fine-tuning, which modifies all weights in the model, this approach identifies a targeted enhancement direction by analyzing differences in activations across specialized and general inputs.

A set of specialized prompts (enhance.txt) and general prompts (normal.txt) are fed into the model.
The activations of a chosen hidden layer are extracted for both prompt types.
The mean hidden state vector for specialized prompts is computed and compared to the mean hidden state vector for general prompts.
Their difference represents the specialization direction, which is then normalized to create a unit vector.

Enhancing Model Weights

Once the specialization direction is computed, it is applied to modify the model’s self-attention output projection weights (o_proj) in a controlled manner:

The specialization direction is projected onto the weight matrix of each attention layer.
A scaled enhancement factor is applied to align the model’s attention outputs more strongly with the specialization direction.
This process amplifies the model’s responses in the desired direction without altering its fundamental structure.

This targeted adjustment allows the model to focus more on specific characteristics (e.g., code reasoning) while maintaining general competency.

Implementation Details

Data Preparation

Two types of datasets are used to define the specialization direction:

Specialized Dataset (enhance.txt): Contains prompts focused on the capability to be enhanced.
General Dataset (normal.txt): Contains diverse, neutral prompts to serve as a baseline.

The difference in activations between these two datasets defines the specialization direction, ensuring that the enhancement is aligned with the target capability while preserving the model’s general functionality.

Key Parameters

instructions: Number of instruction samples to process (default: 500)
layer_idx: Index of the model layer where specialization direction is computed (default: 60% of total layers)
enhancement_factor: Strength of enhancement along the specialization direction (default: 1.5)

Core Algorithm

# Compute specialization direction
specialization_dir = specialized_mean - general_mean
specialization_dir = specialization_dir / specialization_dir.norm()

# Core part of the weight enhancement algorithm
projection_scalars = torch.matmul(attn_output, specialization_dir)
projection = torch.outer(projection_scalars, specialization_dir)
enhanced_weights = attn_output + enhancement_factor * projection

Test

ollama create deepeesk-aijoah -f Modelfile

Conclusion

The Directional Enhancement technique provides an efficient way to strengthen specific capabilities of language models without requiring full retraining or additional training data. While it does not introduce new knowledge, it amplifies latent abilities with minimal computational cost.

This method offers a practical approach for developing AI models specialized in tasks involving speculative coding reasoning analysis in computational contexts.

If the data-related queries are refined in more detail, it may lead to improved hidden-state results. Further research is needed to explore this possibility. Additionally, since the hidden-state information varies depending on the point at which the model is created, its performance may be influenced by timing factors.

sometimes it is not good deep thinking results than original model with short query enhance.txt direction. -> futher investigation needs if it happens for better enhance.txt query.

Therefore, it is not always easy to define an optimal condition. However, observations suggest that this technique operates in intriguing ways, demonstrating compelling behavior in the results.

Citation

@misc{DirectionalEnhancement2025,
       title={Directional Enhancement for Language Models: A Novel Approach to Specialization without Fine-Tuning},
       author={AI JOAH},
       year={2025},
       url={https://www.youtube.com/@JayLee-gv8tv},
}

Contact

AI JOAH : [email protected]

muzerai
/

DeepSeek-R1-Distill-Llama-8B-Code-De-AIJOAH-GGUF