Melvin56
/

DeepSeek-R1-ReDistill-Qwen-7B-v1.1-GGUF

@@ -1,129 +1,29 @@
 ---
-license: mit
-train: false
-inference: true
-pipeline_tag: text-generation
 base_model:
-- deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
 ---
-This is a version of the <a href="https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B">DeepSeek-R1-Distill-Qwen-7B</a> model re-distilled for better performance.
-## Performance
-| Models            | <a href="https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B">DeepSeek-R1-Distill-Qwen-7B</a> | <a href="https://huggingface.co/mobiuslabsgmbh/DeepSeek-R1-ReDistill-Qwen-7B-v1.1">DeepSeek-R1-ReDistill-Qwen-7B-v1.1</a> |
-|:-------------------:|:--------:|:----------------:|
-| ARC (25-shot)      | <b>55.03</b> | 52.3 |
-| HellaSwag (10-shot)| 61.9  | <b>62.36</b> |
-| MMLU (5-shot)      | 56.75 | <b>59.53</b> |
-| TruthfulQA-MC2     | 45.76 | <b>47.7</b> |
-| Winogrande (5-shot)| 60.38 | <b>61.8</b> |
-| GSM8K (5-shot)     | 78.85 | <b>83.4</b> |
-| Average            | 59.78 | <b>61.18</b> |
-| Models            | <a href="https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B">DeepSeek-R1-Distill-Qwen-7B</a> | <a href="https://huggingface.co/mobiuslabsgmbh/DeepSeek-R1-ReDistill-Qwen-7B-v1.1">DeepSeek-R1-ReDistill-Qwen-7B-v1.1</a>  |
-|:-------------------:|:--------:|:----------------:|
-| GPQA (0-shot)     | 30.9  | <b>34.99</b> |
-| MMLU PRO (5-shot) | 28.83 | <b>31.02</b> |
-| MUSR (0-shot)     | 38.85 | <b>44.42</b> |
-| BBH (3-shot)      | 43.54 | <b>51.53</b> |
-| IfEval (0-shot) - strict  | <b>42.33</b> | 35.49 |
-| IfEval (0-shot) - loose   | 30.31 | <b>38.49</b> |
-## Usage
-```Python
-import torch
-from transformers import AutoModelForCausalLM, AutoTokenizer
-compute_dtype = torch.bfloat16
-device   = 'cuda'
-model_id = "mobiuslabsgmbh/DeepSeek-R1-ReDistill-Qwen-7B-v1.1"
-model     = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=compute_dtype, attn_implementation="sdpa", device_map=device)
-tokenizer = AutoTokenizer.from_pretrained(model_id)
-prompt  = "What is 1.5+102.2?"
-chat    = tokenizer.apply_chat_template([{"role":"user", "content":prompt}], tokenize=True, add_generation_prompt=True, return_tensors="pt")
-outputs = model.generate(chat.to(device), max_new_tokens=1024, do_sample=True)
-print(tokenizer.decode(outputs[0]))
-```
-Output:
-```
-<｜begin▁of▁sentence｜><｜User｜>What is 1.5+102.2?<｜Assistant｜><think>
-First, I need to add the whole number parts of the two numbers. The whole numbers are 1 and 102, which add up to 103.
-Next, I add the decimal parts of the two numbers. The decimal parts are 0.5 and 0.2, which add up to 0.7.
-Finally, I combine the whole number and decimal parts to get the total sum. Adding 103 and 0.7 gives me 103.7.
-</think>
-To add the numbers \(1.5\) and \(102.2\), follow these steps:
-1. **Add the whole number parts:**
-   \[
-   1 + 102 = 103
-   \]
-2. **Add the decimal parts:**
-   \[
-   0.5 + 0.2 = 0.7
-   \]
-3. **Combine the results:**
-   \[
-   103 + 0.7 = 103.7
-   \]
-**Final Answer:**
-\[
-\boxed{103.7}
-\]<｜end▁of▁sentence｜>
-```
-## HQQ
-Run ~3.5x faster with <a href="https://github.com/mobiusml/hqq/">HQQ</a>. First, install the dependencies:
-```
-pip install hqq
-```
-```Python
-import torch
-from transformers import AutoModelForCausalLM, AutoTokenizer
-from hqq.models.hf.base import AutoHQQHFModel
-from hqq.core.quantize import *
-#Params
-device        = 'cuda:0'
-backend       = "torchao_int4"
-compute_dtype = torch.bfloat16 if backend=="torchao_int4" else torch.float16
-model_id      = "mobiuslabsgmbh/DeepSeek-R1-ReDistill-Qwen-7B-v1.1"
-#Load
-tokenizer = AutoTokenizer.from_pretrained(model_id)
-model     = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=compute_dtype, attn_implementation="sdpa")
-#Quantize
-quant_config = BaseQuantizeConfig(nbits=4, group_size=64, axis=1)
-AutoHQQHFModel.quantize_model(model, quant_config=quant_config, compute_dtype=compute_dtype, device=device)
-#Optimize
-from hqq.utils.patching import prepare_for_inference
-prepare_for_inference(model, backend=backend, verbose=False)
-############################################################
-#Generate (streaming)
-from hqq.utils.generation_hf import HFGenerator
-gen = HFGenerator(model, tokenizer, max_new_tokens=4096, do_sample=True, compile='partial').warmup()
-prompt = "If A equals B, and C equals B - A, what would be the value of C?"
-out    = gen.generate(prompt, print_tokens=True)
-############################################################
-# #Generate (simple)
-# from hqq.utils.generation_hf import patch_model_for_compiled_runtime
-# patch_model_for_compiled_runtime(model, tokenizer, warmup=True)
-# prompt = "If A equals B, and C equals B - A, what would be the value of C?"
-# chat    = tokenizer.apply_chat_template([{"role":"user", "content":prompt}], tokenize=True, add_generation_prompt=True, return_tensors="pt")
-# outputs = model.generate(chat.to(device), max_new_tokens=8192, do_sample=True)
-# print(tokenizer.decode(outputs[0]))
-```

 ---
 base_model:
+- mobiuslabsgmbh/DeepSeek-R1-ReDistill-Qwen-7B-v1.1
+library_name: transformers
+tags:
+- DeepSeek-R1-Distill-Qwen-7B
+language:
+- en
+pipeline_tag: text-generation
 ---
+# Melvin56/DeepSeek-R1-ReDistill-Qwen-7B-v1.1-GGUF
+Original Model : [mobiuslabsgmbh/DeepSeek-R1-ReDistill-Qwen-7B-v1.1](https://huggingface.co/mobiuslabsgmbh/DeepSeek-R1-ReDistill-Qwen-7B-v1.1)
+All quants are made using the imatrix option
+| Model                                            |   Size (GB)   | Params  |
+|:-------------------------------------------------|:-------------:|:-------:|
+| Q2_K_S |  2.82 GB | 7.62B  |
+| Q2_K |  3.01 GB | 7.62B  |
+| Q3_K_M |  3.80 GB | 7.62B  |
+| Q3_K_M |  3.80 GB | 7.62B  |
+| Q4_0   |  4.43 GB | 7.62B  |
+| Q4_K_M |  4.68 GB | 7.62B  |
+| Q5_K_M |  5.45 GB | 7.62B  |
+| Q6_K   |  6.25 GB | 7.62B  |
+| Q8_0   |  8.10 GB | 7.62B  |
+| F16    | 15.23 GB | 7.62B  |