Code Directional Enhancement for Language Models: A Novel Approach to Specialization without Fine-Tuning

"Even though My experiments and ideas may seem unconventional, wouldn't it be significant if they proved to be effective?

After all, nothing starts out perfect.

The vast realm of AI is like a great wall—while we may not be able to completely cross it, isn't simply climbing up and seeing beyond it still a step forward?

What I am doing now is an attempt to provide a path that allows us to look beyond that wall.

May divine blessings and great wealth be upon all AI researchers who dedicate themselves to exploring these frontiers and pushing the boundaries of the unknown."

This Model by "AI JOAH"

Overview

This model is made by muzerai aka "AI JOAH" using deepseek-ai/DeepSeek-R1-Distill-Llama-8B (test purpose).

Subscribe to my YouTube Channel AI JOAH

This project presents a methodology for enhancing specific capabilities of language models using the Directional Enhancement technique. This approach does not introduce new knowledge into the model but amplifies its existing latent abilities. While preserving the general capabilities of the language model, it significantly improves performance in specific domains such as coding reasoning.

This is a speculative code reasoning enhancement version of deepseek-ai/DeepSeek-R1-Distill-Llama-8B.

If enhance.txt is changed for a different domain, this model style can be adapted accordingly. This test uses 500 instructions for specialization in code reasoning.

datasets reference for 500 samples (only instructions): jtatman/python-code-dataset-500k.

Technical Background

Principle of Directional Enhancement

This approach identifies a specialization direction in the representation space of the language model, associated with a specific capability, and enhances the model’s attention weights in that direction.

  1. Compute the difference in representation between specialized prompts (domain-specific) and general prompts within the model's hidden states.
  2. Normalize this difference vector to obtain the specialization direction.
  3. Enhance the model’s self-attention output projection weights (o_proj) along this specialized direction.

This method strengthens the model’s intrinsic abilities rather than introducing completely new knowledge or patterns. It functions similarly to how a lens amplifies a specific wavelength of light.

Computing Specialization Direction

Unlike conventional fine-tuning, which modifies all weights in the model, this approach identifies a targeted enhancement direction by analyzing differences in activations across specialized and general inputs.

  • A set of specialized prompts (enhance.txt) and general prompts (normal.txt) are fed into the model.
  • The activations of a chosen hidden layer are extracted for both prompt types.
  • The mean hidden state vector for specialized prompts is computed and compared to the mean hidden state vector for general prompts.
  • Their difference represents the specialization direction, which is then normalized to create a unit vector.

Enhancing Model Weights

Once the specialization direction is computed, it is applied to modify the model’s self-attention output projection weights (o_proj) in a controlled manner:

  1. The specialization direction is projected onto the weight matrix of each attention layer.
  2. A scaled enhancement factor is applied to align the model’s attention outputs more strongly with the specialization direction.
  3. This process amplifies the model’s responses in the desired direction without altering its fundamental structure.

This targeted adjustment allows the model to focus more on specific characteristics (e.g., code reasoning) while maintaining general competency.

Implementation Details

Data Preparation

Two types of datasets are used to define the specialization direction:

  • Specialized Dataset (enhance.txt): Contains prompts focused on the capability to be enhanced.
  • General Dataset (normal.txt): Contains diverse, neutral prompts to serve as a baseline.

The difference in activations between these two datasets defines the specialization direction, ensuring that the enhancement is aligned with the target capability while preserving the model’s general functionality.

Key Parameters

  • instructions: Number of instruction samples to process (default: 500)
  • layer_idx: Index of the model layer where specialization direction is computed (default: 60% of total layers)
  • enhancement_factor: Strength of enhancement along the specialization direction (default: 1.5)

Core Algorithm

# Compute specialization direction
specialization_dir = specialized_mean - general_mean
specialization_dir = specialization_dir / specialization_dir.norm()

# Core part of the weight enhancement algorithm
projection_scalars = torch.matmul(attn_output, specialization_dir)
projection = torch.outer(projection_scalars, specialization_dir)
enhanced_weights = attn_output + enhancement_factor * projection

Test

(.venv) jaylee@lees-MacBook-Pro-2 coding % python test.py

사용 장치: mps
테스트 프롬프트: Write code to find the sum of all prime numbers between 1 million and 2 million, excluding prime numbers that contain the digit 7.

=== 원본 모델 테스트 ===
deepseek-ai/DeepSeek-R1-Distill-Llama-8B 모델 로딩 중...
Loading checkpoint shards:   0%|                                                                                                      | 0/2 [00:00<?, ?it/s]
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:03<00:00,  1.82s/it]

생성 중... (최대 1500 토큰)
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
<|begin▁of▁sentence|><|User|>Write code to find the sum of all prime numbers between 1 million and 2 million, excluding prime numbers that contain the digit 7.<|Assistant|><think>
Okay, so I need to write a Python program to find the sum of all prime numbers between 1 million and 2 million, but I have to exclude any primes that contain the digit 7. Hmm, that sounds a bit challenging, but I think I can figure it out step by step.

First, I remember that prime numbers are numbers greater than 1 that have no divisors other than 1 and themselves. So, I need a way to check if a number is prime. Maybe I can use the Sieve of Eratosthenes algorithm for this? That algorithm is efficient for finding primes up to a certain limit.

Wait, but the Sieve of Eratosthenes is usually used for numbers up to n, right? In this case, I need primes between 1 million and 2 million. So, maybe I can generate all primes up to 2 million and then filter out those between 1 million and 2 million. That makes sense.

But I also need to exclude primes that contain the digit 7. Hmm, so after generating all primes in that range, I have to go through each prime and check if any of its digits is a 7. If it doesn't have a 7, I include it in the sum; if it does, I exclude it.

Alright, let's break this down.

First, I'll implement the Sieve of Eratosthenes. I need to create a boolean list where each index represents whether the number is prime. The sieve starts by assuming all numbers are prime, then marks the non-primes.

So, I'll start by initializing a list of booleans with size 2,000,001 (since we're going up to 2 million). Wait, no, actually, 2 million is the upper limit, but the sieve needs to go up to that number to mark all non-primes. So, I'll set the sieve size to 2,000,001 because indexing starts from 0.

But wait, actually, the sieve needs to include numbers up to 2,000,000. So, the sieve list should be of size 2,000,001, right? Because the indices go from 0 to 2,000,000. Yeah, that's correct.

Next, I'll set all entries in the sieve to True initially, except index 0 and 1, which are not primes. Then, starting from 2, I'll mark all multiples of each prime as non-prime.

Once the sieve is complete, I can collect all the primes in the range 1,000,000 to 2,000,000. But wait, actually, the sieve will mark all primes up to 2 million, so I need to iterate through all primes from 2 to 2,000,000 and then filter those between 1,000,000 and 2,000,000.

Hold on, no. The sieve will mark all primes up to 2 million, so when I collect the primes, I can just look at those in the specified range. So, I can loop from 1,000,000 to 2,000,000, and for each number, check if it's prime using the sieve. If it is, then check if it contains the digit 7. If it doesn't, add it to the sum.

But wait, that might not be the most efficient way because looping through every number between 1 million and 2 million and checking if it's prime could be slow. Alternatively, I could collect all primes up to 2 million first and then filter those in the desired range. That might be more efficient.

Yes, that sounds better. So, after generating the sieve, I can collect all primes into a list. Then, from that list, I can extract the primes that are between 1 million and 2 million. Then, from those, I can exclude any primes that have a 7 in their digits.

So, step by step:

1. Generate the sieve up to 2,000,000.
2. Collect all primes from the sieve into a list called primes.
3. Filter this list to get only those primes where the number is >= 1,000,000 and <= 2,000,000.
4. For each prime in this filtered list, check if any digit is 7.
5. Sum all primes that pass the digit check.

Now, let's think about how to implement each step.

First, the sieve. The sieve of Eratosthenes in Python can be implemented with a boolean list. Let's write that.

Initialize sieve = [True] * (max_num + 1), where max_num is 2,000,000. Then, set sieve[0] and sieve[1] to False. Then, for each number p starting from 2, if sieve[p] is True, then mark all multiples of p starting from p*p (or p*2, but p*p is more efficient) as False.

Wait, but for p in range 2 to sqrt(2,000,000), that's about 1414. So, the sieve will run efficiently.

Once sieve is complete, collect all primes by iterating from 2 to 2,000,000 and adding p to primes if sieve[p] is True.

Then, filter primes_in_range = [p for p in primes if 1,000,000 <= p <= 2,000,000]

Then, for each prime in primes_in_range, check if '7' is in the string representation of the prime.

So, for example, for prime = 1234567, str(prime) would be '1234567', which contains '7', so it would be excluded.

So, in code, for each p in primes_in_range:

if '7' not in str(p):

sum += p

That should work.

Now, considering the size of the sieve, 2,000,000 is manageable in Python, but it might take some time to generate. Let's see, 2 million is not too bad for sieve operations.

Wait, but in Python, creating a list of 2,000,001 booleans is about 2MB (since each boolean is a byte, but actually, in Python, a list of booleans uses more memory because each element is a reference, but I think it's manageable). So, I think it's feasible.

Alternatively, if the sieve is too slow, maybe I can find another method, but I think the sieve is the most straightforward.

Now, let's outline the code:

1. Define max_num = 2,000,000.

2. Create sieve = [True] * (max_num + 1)

3. sieve[0] = sieve[1] = False

4. for p in range(2, int(max_num**0.5) + 1):

   if sieve[p]:

      for multiple in range(p*p, max_num + 1, p):

          sieve[multiple] = False

5. Collect all primes: primes = [p for p in range(2, max_num + 1) if sieve[p]]

6. Filter primes_in_range = [p for p in primes if p >=
..........






=== 증강 모델 테스트 ===
DeepSeek-R1-Distill-Llama-8B-Code-De-AIJOAH 모델 로딩 중...
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:14<00:00,  3.74s/it]

생성 중... (최대 1500 토큰)
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
<|begin▁of▁sentence|><|User|>Write code to find the sum of all prime numbers between 1 million and 2 million, excluding prime numbers that contain the digit 7.<|Assistant|><think>
Alright, so I have this problem where I need to find the sum of all prime numbers between 1 million and 2 million, but I have to exclude any primes that contain the digit 7. Hmm, okay. Let me think about how to approach this.

First, I know that prime numbers are numbers greater than 1 that have no divisors other than 1 and themselves. So, I need to check each number in the range 1,000,000 to 2,000,000 to see if it's prime. But wait, that's a lot of numbers! There must be a smarter way than checking each one individually.

I remember something about the Sieve of Eratosthenes, which is an algorithm used to find all primes up to a certain limit. But the standard sieve is good for smaller numbers because it requires a boolean array where each index represents whether a number is prime. However, if I try to create an array for 2 million, that's going to be 2 million booleans, which is manageable, but might take up some memory. Let me see: 2 million is about 2MB if using bits, but in reality, each boolean is usually a byte, so that's 2MB. That's manageable on most computers.

But wait, the problem isn't just to find primes; it's to exclude primes that contain the digit 7. So, after generating all primes in that range, I need to go through each prime and check if any of their digits is 7. If they don't have a 7, I add them to the sum.

So, my plan is:

1. Generate all prime numbers between 1,000,000 and 2,000,000 using the Sieve of Eratosthenes.
2. For each prime, check if it contains the digit 7.
3. If it doesn't contain 7, add it to the sum.

But wait, generating all primes up to 2 million might be a bit memory-intensive. Let me think if there's a better way. Alternatively, maybe I can iterate through the numbers and check for primes on the fly, but that might be slower because checking each number for primality could take time.

Alternatively, perhaps I can use a segmented sieve approach, which is more efficient for finding primes in a specific range without generating all primes up to that limit. But I'm not sure how to implement that quickly.

Hmm, maybe for simplicity, I can first generate all primes up to 2 million and then filter out those containing the digit 7. Let's go with the sieve method first.

Okay, so the Sieve of Eratosthenes works by creating a boolean array where each index represents whether the number is prime. We start with 0 and 1 as non-prime, then mark multiples of each prime starting from 2.

So, I can create an array of size 2,000,001 (since the upper limit is 2,000,000) initialized to True, except for indices 0 and 1, which are False.

Then, for each number starting from 2, if it's still marked as prime (True), I mark all its multiples as non-prime.

After the sieve is complete, I'll have a list of primes between 1 and 2,000,000. Then, I need to extract those between 1,000,000 and 2,000,000.

Once I have that list, I need to filter out any primes that contain the digit 7. How do I check if a number contains a specific digit? Well, I can convert the number to a string and check if '7' is in the string.

So, for each prime in the filtered list, I'll convert it to a string, check if '7' is present. If not, add it to the sum.

Let me outline the steps in code:

1. Initialize the sieve array.
2. Mark non-primes.
3. Collect primes in the range.
4. For each prime, check for digit '7'.
5. Sum those without '7'.

Wait, but 2,000,000 is a large number. How long will this take? On a typical computer, it might take a few seconds or minutes. But I think it's manageable.

Now, let me think about the sieve implementation.

In Python, creating a list of 2,000,001 booleans is feasible. Each element is a byte, so 2MB is not too bad.

But in Python, for efficiency, sometimes using a bit array is better, but for clarity, using a list of booleans is straightforward.

So, let me write the sieve:

Initialize sieve = [True] * (upper_limit + 1)
sieve[0] = sieve[1] = False
for i in range(2, int(upper_limit**0.5) + 1):
    if sieve[i]:
        sieve[i*i : upper_limit+1 : i] = [False] * len(range(i*i, upper_limit+1, i))

But in Python, the assignment sieve[i*i : ... ] = [False] * ... is efficient because it's a slice assignment. So that should work.

Once the sieve is complete, collect all primes in the range:

primes = [i for i in range(1000000, 2000001) if sieve[i]]

Then, for each prime, check if '7' is in its string representation:

sum_primes = 0
for prime in primes:
    if '7' not in str(prime):
        sum_primes += prime

That should give the desired sum.

But let me think about the efficiency. For 2 million numbers, the sieve should be manageable. The sieve is O(n log log n), which for 2 million is about 2 million iterations, which is acceptable.

But wait, in the sieve, the sieve is built up to 2,000,000, but the primes are only those between 1,000,000 and 2,000,000. So, perhaps we can optimize the sieve to only go up to sqrt(2,000,000) for the sieve, but no, the sieve needs to go up to sqrt(2,000,000) for the inner loop, but the sieve array needs to cover up to 2,000,000.

Alternatively, maybe we can optimize the sieve by noting that for numbers beyond 1,000,000, we can handle them differently, but I think it's not necessary.

Wait, another thought: when generating the sieve, we can limit the sieve to 2,000,000, but perhaps we can skip some steps. For example, when i is greater than sqrt(2,000,000), we can break early because the multiples would exceed 2,000,000.

Wait, sqrt(2,000,000) is approximately 1414. So, when i exceeds 1414, the square of i would be greater than 2,000,000. So, in the sieve, we can stop the loop at i=1414.

Yes, that's correct. So, in the sieve loop, we can go up to min(i_max,
.......
The original model and enhanced model follow a similar logical approach to solving the problem. Both use the Sieve of Eratosthenes to find prime numbers and then filter out those containing the digit '7'. However, there are key differences:
* The enhanced model considers Segmented Sieve as a potential optimization, while the original model strictly follows the traditional sieve method.
* The enhanced model is more conscious of memory usage, suggesting bit arrays for better efficiency.
* The enhanced model also explores ways to minimize unnecessary iterations, specifically stopping sieve computations after reaching sqrt(2,000,000) (≈1414).
* The original model implements a straightforward solution without these additional optimizations.
Conclusion:
The enhanced model performs deeper optimization analysis and suggests more efficient methods, but ultimately, both models produce a similar solution structure. The next step would be to apply actual code-level optimizations like bit arrays or parallel processing to differentiate the enhanced model further. 🚀

Conclusion

The Directional Enhancement technique provides an efficient way to strengthen specific capabilities of language models without requiring full retraining or additional training data. While it does not introduce new knowledge, it amplifies latent abilities with minimal computational cost.

This method offers a practical approach for developing AI models specialized in tasks involving speculative coding reasoning analysis in computational contexts.

If the data-related queries are refined in more detail, it may lead to improved hidden-state results. Further research is needed to explore this possibility. Additionally, since the hidden-state information varies depending on the point at which the model is created, its performance may be influenced by timing factors.

sometimes it is not good deep thinking results than original model with short query enhance.txt direction. -> futher investigation needs if it happens for better enhance.txt query.

Therefore, it is not always easy to define an optimal condition. However, observations suggest that this technique operates in intriguing ways, demonstrating compelling behavior in the results.

Citation

@misc{DirectionalEnhancement2025,
       title={Directional Enhancement for Language Models: A Novel Approach to Specialization without Fine-Tuning},
       author={AI JOAH},
       year={2025},
       url={https://www.youtube.com/@JayLee-gv8tv},
}

Contact

Downloads last month
22
Safetensors
Model size
8.03B params
Tensor type
FP16
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for muzerai/DeepSeek-R1-Distill-Llama-8B-Code-De-AIJOAH

Finetuned
(52)
this model