Text Classification
Transformers
Safetensors
English
Chinese
qwen2
text-generation
Merge
mergekit
lazymergekit
huihui-ai/DeepSeek-R1-Distill-Qwen-7B-abliterated-v2
mobiuslabsgmbh/DeepSeek-R1-ReDistill-Qwen-7B-v1.1
Triangle104/DSR1-Distill-Qwen-7B-RP
deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
text-generation-inference
Inference Endpoints
File size: 6,970 Bytes
043857b 15edd3a 043857b 15edd3a 043857b 15edd3a 043857b 15edd3a 043857b 15edd3a 043857b 15edd3a |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 |
---
license: apache-2.0
tags:
- merge
- mergekit
- lazymergekit
- huihui-ai/DeepSeek-R1-Distill-Qwen-7B-abliterated-v2
- mobiuslabsgmbh/DeepSeek-R1-ReDistill-Qwen-7B-v1.1
- Triangle104/DSR1-Distill-Qwen-7B-RP
- deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
language:
- en
- zh
base_model:
- deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
- huihui-ai/DeepSeek-R1-Distill-Qwen-7B-abliterated-v2
- mobiuslabsgmbh/DeepSeek-R1-ReDistill-Qwen-7B-v1.1
- Triangle104/DSR1-Distill-Qwen-7B-RP
pipeline_tag: text-classification
library_name: transformers
---
# ZeroXClem/Qwen2.5-7B-DistilPrism
**Qwen2.5-7B-DistilPrism** is a **distillation / reasoning focused model merge** designed to combine multiple variations of DeepSeek-R1 distillations, resulting in a **refined, high-performance language model**. Utilizing the **Model Stock** merge method, this fusion captures the best attributes of **DeepSeek-R1-Distill-Qwen-7B** and its improved derivatives.
## 🚀 Merged Models
This model is a weighted merge of the following:
- [**huihui-ai/DeepSeek-R1-Distill-Qwen-7B-abliterated-v2**](https://huggingface.co/huihui-ai/DeepSeek-R1-Distill-Qwen-7B-abliterated-v2): An uncensored distillation of DeepSeek-R1, optimized to remove refusals and improve usability.
- [**mobiuslabsgmbh/DeepSeek-R1-ReDistill-Qwen-7B-v1.1**](https://huggingface.co/mobiuslabsgmbh/DeepSeek-R1-ReDistill-Qwen-7B-v1.1): A refined distillation that improves accuracy and robustness across various benchmarks.
- [**Triangle104/DSR1-Distill-Qwen-7B-RP**](https://huggingface.co/Triangle104/DSR1-Distill-Qwen-7B-RP): A composite merge of various distilled DeepSeek variants, serving as an essential ingredient for performance tuning.
- [**deepseek-ai/DeepSeek-R1-Distill-Qwen-7B**](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B): The foundation of this merge, representing the distilled form of DeepSeek-R1 optimized for efficiency and strong reasoning capabilities.
## 🧩 Merge Configuration
The following **YAML configuration** defines how these models were combined using **Model Stock**, ensuring **balanced contributions** from each source:
```yaml
# Merge configuration for ZeroXClem/Qwen2.5-7B-DistilPrism using Model Stock
name: ZeroXClem-Qwen2.5-7B-DistilPrism
merge_method: model_stock
base_model: deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
tokenizer_source: deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
dtype: bfloat16
parameters:
normalize: true
rescale: true
models:
- model: huihui-ai/DeepSeek-R1-Distill-Qwen-7B-abliterated-v2
parameters:
weight: 0.3
- model: mobiuslabsgmbh/DeepSeek-R1-ReDistill-Qwen-7B-v1.1
parameters:
weight: 0.25
- model: Triangle104/DSR1-Distill-Qwen-7B-RP
parameters:
weight: 0.2
- model: deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
parameters:
weight: 0.25
```
### 🔑 Key Parameters
- **Normalization & Rescaling**: Ensures weight distributions remain balanced across all components.
- **Model Stock Merge Method**: Optimizes contribution from each model to retain the best attributes.
- **Weighted Blending**: The **abliterated** and **re-distilled** models contribute the most, refining both alignment and general usability.
---
## 🗣️ Inference
You can use the model for text generation as follows:
### Ollama
**[Quickstart to Ollama Guide Here](https://aidev.zeroxclem.com/blog/08-setting-up-ollama)** I recommend ollama for daily driver applications, as it supports thinkking tags.
```bash
ollama run hf.co/ZeroXClem/Qwen2.5-7B-DistilPrism
# If you are using quants, just copy the url and replace 'huggingface.co/' with 'hf.co/' followed by name of quant.
```
### Transformers
```python
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
import torch
# Define the model name
model_name = "ZeroXClem/Qwen2.5-7B-DistilPrism"
# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)
# Load the model
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.bfloat16,
device_map="auto"
)
# Initialize the pipeline
text_generator = pipeline(
"text-generation",
model=model,
tokenizer=tokenizer,
torch_dtype=torch.bfloat16,
device_map="auto"
)
# Define the input prompt
prompt = "Explain the significance of artificial intelligence in modern healthcare."
# Generate the output
outputs = text_generator(
prompt,
max_new_tokens=150,
do_sample=True,
temperature=0.7,
top_k=50,
top_p=0.95
)
# Print the generated text
print(outputs[0]["generated_text"])
```
---
## 🎯 Use Case & Applications
**Qwen2.5-7B-DistilPrism** is designed for **efficient, high-quality text generation** with strong reasoning capabilities. It is well-suited for:
- **Advanced Reasoning & Problem Solving**: Excels in logic-heavy tasks and multi-step reasoning problems.
- **Conversational AI**: Optimized for **fluid, responsive dialogue**, reducing refusals and improving engagement.
- **Mathematical & Scientific Computation**: Enhanced **math & code generation abilities** compared to standard distillations.
- **Content Creation & Summarization**: Generates coherent and **contextually rich** text suitable for various applications.
---
## 📜 License
This model is released under the **MIT License**.
---
## 📊 Benchmark Results (Coming Soon)
We are currently in the process of **quantizing and benchmarking** this model. Stay tuned for performance updates across:
- **IFEval (0-Shot)**
- **BBH (3-Shot)**
- **MATH (4-Shot)**
- **GPQA (0-Shot)**
- **MuSR (0-Shot)**
- **MMLU-PRO (5-Shot)**
---
## 💡 Tags
- `merge`
- `mergekit`
- `model_stock`
- `DeepSeek-R1`
- `Distillation`
- `abliterated`
- `re-distilled`
- `DeepSeek-R1-Distill-Qwen-7B`
---
## 🙏 Special Thanks
This project wouldn't be possible without the incredible contributions from:
- **[@huihui-ai](https://huggingface.co/huihui-ai)** – For developing **DeepSeek-R1-Distill-Qwen-7B-abliterated-v2**, a bold step towards improving model alignment.
- **[@mobiuslabsgmbh](https://huggingface.co/mobiuslabsgmbh)** – For refining distillation techniques with **DeepSeek-R1-ReDistill-Qwen-7B-v1.1**.
- **[@Triangle104](https://huggingface.co/Triangle104)** – For crafting innovative merges like **DSR1-Distill-Qwen-7B-RP**, an essential component in this blend.
- **[@deepseek-ai](https://huggingface.co/deepseek-ai)** – For open-sourcing **DeepSeek-R1-Distill-Qwen-7B**, a foundation for reasoning advancements.
And a heartfelt **thank you** to everyone in the **🤗 & Open-Source AI community** for their continued research, testing, and support. 💜🚀
---
# 🔗 Additional Resources
- [Hugging Face Model Card](https://huggingface.co/ZeroXClem/Qwen2.5-7B-DistilPrism)
- [MergeKit Repository](https://github.com/ZeroXClem/mergekit)
- [DeepSeek AI Homepage](https://huggingface.co/deepseek-ai)
- [Open LLM Leaderboard](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard) |