Melvin56 commited on
Commit
fdcbf3b
·
verified ·
1 Parent(s): 5746ffb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +22 -122
README.md CHANGED
@@ -1,129 +1,29 @@
1
  ---
2
- license: mit
3
- train: false
4
- inference: true
5
- pipeline_tag: text-generation
6
  base_model:
7
- - deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
 
 
 
 
 
 
8
  ---
9
- This is a version of the <a href="https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B">DeepSeek-R1-Distill-Qwen-7B</a> model re-distilled for better performance.
10
-
11
- ## Performance
12
-
13
- | Models | <a href="https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B">DeepSeek-R1-Distill-Qwen-7B</a> | <a href="https://huggingface.co/mobiuslabsgmbh/DeepSeek-R1-ReDistill-Qwen-7B-v1.1">DeepSeek-R1-ReDistill-Qwen-7B-v1.1</a> |
14
- |:-------------------:|:--------:|:----------------:|
15
- | ARC (25-shot) | <b>55.03</b> | 52.3 |
16
- | HellaSwag (10-shot)| 61.9 | <b>62.36</b> |
17
- | MMLU (5-shot) | 56.75 | <b>59.53</b> |
18
- | TruthfulQA-MC2 | 45.76 | <b>47.7</b> |
19
- | Winogrande (5-shot)| 60.38 | <b>61.8</b> |
20
- | GSM8K (5-shot) | 78.85 | <b>83.4</b> |
21
- | Average | 59.78 | <b>61.18</b> |
22
-
23
- | Models | <a href="https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B">DeepSeek-R1-Distill-Qwen-7B</a> | <a href="https://huggingface.co/mobiuslabsgmbh/DeepSeek-R1-ReDistill-Qwen-7B-v1.1">DeepSeek-R1-ReDistill-Qwen-7B-v1.1</a> |
24
- |:-------------------:|:--------:|:----------------:|
25
- | GPQA (0-shot) | 30.9 | <b>34.99</b> |
26
- | MMLU PRO (5-shot) | 28.83 | <b>31.02</b> |
27
- | MUSR (0-shot) | 38.85 | <b>44.42</b> |
28
- | BBH (3-shot) | 43.54 | <b>51.53</b> |
29
- | IfEval (0-shot) - strict | <b>42.33</b> | 35.49 |
30
- | IfEval (0-shot) - loose | 30.31 | <b>38.49</b> |
31
-
32
- ## Usage
33
- ```Python
34
- import torch
35
- from transformers import AutoModelForCausalLM, AutoTokenizer
36
- compute_dtype = torch.bfloat16
37
- device = 'cuda'
38
- model_id = "mobiuslabsgmbh/DeepSeek-R1-ReDistill-Qwen-7B-v1.1"
39
-
40
- model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=compute_dtype, attn_implementation="sdpa", device_map=device)
41
- tokenizer = AutoTokenizer.from_pretrained(model_id)
42
-
43
- prompt = "What is 1.5+102.2?"
44
- chat = tokenizer.apply_chat_template([{"role":"user", "content":prompt}], tokenize=True, add_generation_prompt=True, return_tensors="pt")
45
- outputs = model.generate(chat.to(device), max_new_tokens=1024, do_sample=True)
46
- print(tokenizer.decode(outputs[0]))
47
- ```
48
-
49
- Output:
50
- ```
51
- <|begin▁of▁sentence|><|User|>What is 1.5+102.2?<|Assistant|><think>
52
- First, I need to add the whole number parts of the two numbers. The whole numbers are 1 and 102, which add up to 103.
53
-
54
- Next, I add the decimal parts of the two numbers. The decimal parts are 0.5 and 0.2, which add up to 0.7.
55
-
56
- Finally, I combine the whole number and decimal parts to get the total sum. Adding 103 and 0.7 gives me 103.7.
57
- </think>
58
-
59
- To add the numbers \(1.5\) and \(102.2\), follow these steps:
60
-
61
- 1. **Add the whole number parts:**
62
- \[
63
- 1 + 102 = 103
64
- \]
65
-
66
- 2. **Add the decimal parts:**
67
- \[
68
- 0.5 + 0.2 = 0.7
69
- \]
70
-
71
- 3. **Combine the results:**
72
- \[
73
- 103 + 0.7 = 103.7
74
- \]
75
-
76
- **Final Answer:**
77
- \[
78
- \boxed{103.7}
79
- \]<|end▁of▁sentence|>
80
- ```
81
-
82
- ## HQQ
83
- Run ~3.5x faster with <a href="https://github.com/mobiusml/hqq/">HQQ</a>. First, install the dependencies:
84
- ```
85
- pip install hqq
86
- ```
87
-
88
- ```Python
89
- import torch
90
- from transformers import AutoModelForCausalLM, AutoTokenizer
91
- from hqq.models.hf.base import AutoHQQHFModel
92
- from hqq.core.quantize import *
93
-
94
- #Params
95
- device = 'cuda:0'
96
- backend = "torchao_int4"
97
- compute_dtype = torch.bfloat16 if backend=="torchao_int4" else torch.float16
98
- model_id = "mobiuslabsgmbh/DeepSeek-R1-ReDistill-Qwen-7B-v1.1"
99
-
100
- #Load
101
- tokenizer = AutoTokenizer.from_pretrained(model_id)
102
- model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=compute_dtype, attn_implementation="sdpa")
103
-
104
- #Quantize
105
- quant_config = BaseQuantizeConfig(nbits=4, group_size=64, axis=1)
106
- AutoHQQHFModel.quantize_model(model, quant_config=quant_config, compute_dtype=compute_dtype, device=device)
107
-
108
- #Optimize
109
- from hqq.utils.patching import prepare_for_inference
110
- prepare_for_inference(model, backend=backend, verbose=False)
111
 
112
- ############################################################
113
- #Generate (streaming)
114
- from hqq.utils.generation_hf import HFGenerator
115
- gen = HFGenerator(model, tokenizer, max_new_tokens=4096, do_sample=True, compile='partial').warmup()
116
 
117
- prompt = "If A equals B, and C equals B - A, what would be the value of C?"
118
- out = gen.generate(prompt, print_tokens=True)
119
 
120
- ############################################################
121
- # #Generate (simple)
122
- # from hqq.utils.generation_hf import patch_model_for_compiled_runtime
123
- # patch_model_for_compiled_runtime(model, tokenizer, warmup=True)
124
 
125
- # prompt = "If A equals B, and C equals B - A, what would be the value of C?"
126
- # chat = tokenizer.apply_chat_template([{"role":"user", "content":prompt}], tokenize=True, add_generation_prompt=True, return_tensors="pt")
127
- # outputs = model.generate(chat.to(device), max_new_tokens=8192, do_sample=True)
128
- # print(tokenizer.decode(outputs[0]))
129
- ```
 
 
 
 
 
 
 
 
1
  ---
 
 
 
 
2
  base_model:
3
+ - mobiuslabsgmbh/DeepSeek-R1-ReDistill-Qwen-7B-v1.1
4
+ library_name: transformers
5
+ tags:
6
+ - DeepSeek-R1-Distill-Qwen-7B
7
+ language:
8
+ - en
9
+ pipeline_tag: text-generation
10
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
 
12
+ # Melvin56/DeepSeek-R1-ReDistill-Qwen-7B-v1.1-GGUF
 
 
 
13
 
14
+ Original Model : [mobiuslabsgmbh/DeepSeek-R1-ReDistill-Qwen-7B-v1.1](https://huggingface.co/mobiuslabsgmbh/DeepSeek-R1-ReDistill-Qwen-7B-v1.1)
 
15
 
16
+ All quants are made using the imatrix option
 
 
 
17
 
18
+ | Model | Size (GB) | Params |
19
+ |:-------------------------------------------------|:-------------:|:-------:|
20
+ | Q2_K_S | 2.82 GB | 7.62B |
21
+ | Q2_K | 3.01 GB | 7.62B |
22
+ | Q3_K_M | 3.80 GB | 7.62B |
23
+ | Q3_K_M | 3.80 GB | 7.62B |
24
+ | Q4_0 | 4.43 GB | 7.62B |
25
+ | Q4_K_M | 4.68 GB | 7.62B |
26
+ | Q5_K_M | 5.45 GB | 7.62B |
27
+ | Q6_K | 6.25 GB | 7.62B |
28
+ | Q8_0 | 8.10 GB | 7.62B |
29
+ | F16 | 15.23 GB | 7.62B |