File size: 6,970 Bytes
043857b
 
 
 
 
 
 
 
 
 
15edd3a
 
 
 
 
 
 
 
 
 
043857b
 
 
 
15edd3a
 
 
 
 
 
 
 
 
 
043857b
15edd3a
 
 
043857b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15edd3a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
043857b
15edd3a
043857b
15edd3a
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
---
license: apache-2.0
tags:
- merge
- mergekit
- lazymergekit
- huihui-ai/DeepSeek-R1-Distill-Qwen-7B-abliterated-v2
- mobiuslabsgmbh/DeepSeek-R1-ReDistill-Qwen-7B-v1.1
- Triangle104/DSR1-Distill-Qwen-7B-RP
- deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
language:
- en
- zh
base_model:
- deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
- huihui-ai/DeepSeek-R1-Distill-Qwen-7B-abliterated-v2
- mobiuslabsgmbh/DeepSeek-R1-ReDistill-Qwen-7B-v1.1
- Triangle104/DSR1-Distill-Qwen-7B-RP
pipeline_tag: text-classification
library_name: transformers
---

# ZeroXClem/Qwen2.5-7B-DistilPrism

**Qwen2.5-7B-DistilPrism** is a **distillation / reasoning focused model merge** designed to combine multiple variations of DeepSeek-R1 distillations, resulting in a **refined, high-performance language model**. Utilizing the **Model Stock** merge method, this fusion captures the best attributes of **DeepSeek-R1-Distill-Qwen-7B** and its improved derivatives.

## 🚀 Merged Models

This model is a weighted merge of the following:

- [**huihui-ai/DeepSeek-R1-Distill-Qwen-7B-abliterated-v2**](https://huggingface.co/huihui-ai/DeepSeek-R1-Distill-Qwen-7B-abliterated-v2): An uncensored distillation of DeepSeek-R1, optimized to remove refusals and improve usability.
- [**mobiuslabsgmbh/DeepSeek-R1-ReDistill-Qwen-7B-v1.1**](https://huggingface.co/mobiuslabsgmbh/DeepSeek-R1-ReDistill-Qwen-7B-v1.1): A refined distillation that improves accuracy and robustness across various benchmarks.
- [**Triangle104/DSR1-Distill-Qwen-7B-RP**](https://huggingface.co/Triangle104/DSR1-Distill-Qwen-7B-RP): A composite merge of various distilled DeepSeek variants, serving as an essential ingredient for performance tuning.
- [**deepseek-ai/DeepSeek-R1-Distill-Qwen-7B**](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B): The foundation of this merge, representing the distilled form of DeepSeek-R1 optimized for efficiency and strong reasoning capabilities.

## 🧩 Merge Configuration

The following **YAML configuration** defines how these models were combined using **Model Stock**, ensuring **balanced contributions** from each source:

```yaml
# Merge configuration for ZeroXClem/Qwen2.5-7B-DistilPrism using Model Stock
name: ZeroXClem-Qwen2.5-7B-DistilPrism
merge_method: model_stock
base_model: deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
tokenizer_source: deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
dtype: bfloat16
parameters:
  normalize: true
  rescale: true
models:
  - model: huihui-ai/DeepSeek-R1-Distill-Qwen-7B-abliterated-v2
    parameters:
      weight: 0.3
  - model: mobiuslabsgmbh/DeepSeek-R1-ReDistill-Qwen-7B-v1.1
    parameters:
      weight: 0.25
  - model: Triangle104/DSR1-Distill-Qwen-7B-RP
    parameters:
      weight: 0.2
  - model: deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
    parameters:
      weight: 0.25
```

### 🔑 Key Parameters

- **Normalization & Rescaling**: Ensures weight distributions remain balanced across all components.
- **Model Stock Merge Method**: Optimizes contribution from each model to retain the best attributes.
- **Weighted Blending**: The **abliterated** and **re-distilled** models contribute the most, refining both alignment and general usability.

---

## 🗣️ Inference

You can use the model for text generation as follows:

### Ollama

**[Quickstart to Ollama Guide Here](https://aidev.zeroxclem.com/blog/08-setting-up-ollama)** I recommend ollama for daily driver applications, as it supports thinkking tags. 

```bash
ollama run hf.co/ZeroXClem/Qwen2.5-7B-DistilPrism

# If you are using quants, just copy the url and replace 'huggingface.co/' with 'hf.co/' followed by name of quant. 
```

### Transformers

```python
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
import torch

# Define the model name
model_name = "ZeroXClem/Qwen2.5-7B-DistilPrism"

# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Load the model
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

# Initialize the pipeline
text_generator = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

# Define the input prompt
prompt = "Explain the significance of artificial intelligence in modern healthcare."

# Generate the output
outputs = text_generator(
    prompt,
    max_new_tokens=150,
    do_sample=True,
    temperature=0.7,
    top_k=50,
    top_p=0.95
)

# Print the generated text
print(outputs[0]["generated_text"])
```

---

## 🎯 Use Case & Applications

**Qwen2.5-7B-DistilPrism** is designed for **efficient, high-quality text generation** with strong reasoning capabilities. It is well-suited for:

- **Advanced Reasoning & Problem Solving**: Excels in logic-heavy tasks and multi-step reasoning problems.
- **Conversational AI**: Optimized for **fluid, responsive dialogue**, reducing refusals and improving engagement.
- **Mathematical & Scientific Computation**: Enhanced **math & code generation abilities** compared to standard distillations.
- **Content Creation & Summarization**: Generates coherent and **contextually rich** text suitable for various applications.

---

## 📜 License

This model is released under the **MIT License**.

---

## 📊 Benchmark Results (Coming Soon)

We are currently in the process of **quantizing and benchmarking** this model. Stay tuned for performance updates across:

- **IFEval (0-Shot)**
- **BBH (3-Shot)**
- **MATH (4-Shot)**
- **GPQA (0-Shot)**
- **MuSR (0-Shot)**
- **MMLU-PRO (5-Shot)**

---

## 💡 Tags

- `merge`
- `mergekit`
- `model_stock`
- `DeepSeek-R1`
- `Distillation`
- `abliterated`
- `re-distilled`
- `DeepSeek-R1-Distill-Qwen-7B`

---

## 🙏 Special Thanks

This project wouldn't be possible without the incredible contributions from:

- **[@huihui-ai](https://huggingface.co/huihui-ai)** – For developing **DeepSeek-R1-Distill-Qwen-7B-abliterated-v2**, a bold step towards improving model alignment.  
- **[@mobiuslabsgmbh](https://huggingface.co/mobiuslabsgmbh)** – For refining distillation techniques with **DeepSeek-R1-ReDistill-Qwen-7B-v1.1**.  
- **[@Triangle104](https://huggingface.co/Triangle104)** – For crafting innovative merges like **DSR1-Distill-Qwen-7B-RP**, an essential component in this blend.  
- **[@deepseek-ai](https://huggingface.co/deepseek-ai)** – For open-sourcing **DeepSeek-R1-Distill-Qwen-7B**, a foundation for reasoning advancements.  

And a heartfelt **thank you** to everyone in the **🤗 & Open-Source AI community** for their continued research, testing, and support. 💜🚀  

---


# 🔗 Additional Resources

- [Hugging Face Model Card](https://huggingface.co/ZeroXClem/Qwen2.5-7B-DistilPrism)
- [MergeKit Repository](https://github.com/ZeroXClem/mergekit)
- [DeepSeek AI Homepage](https://huggingface.co/deepseek-ai)
- [Open LLM Leaderboard](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)