bartowski/DeepSeek-R1-Distill-Qwen-32B-GGUF · R1 32b is much worse than QwQ ...

Jan 21

•

https://www.reddit.com/r/LocalLLaMA/comments/1i65599/r1_32b_is_be_worse_than_qwq_32b_tests_included/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button

Do you experience that?
I can confirm is bad if you compare to QwQ ...

Rotating

Jan 21

•

edited Jan 21

The system prompt that's floating around seems to help a lot (I transcribed it from a screenshot):

A conversation between User and Assistant. The user asks a question, and the Assistant solves it. The assistant first thinks about the reasoning process in the mind and then provides the user with the answer. The reasoning process and answer are enclosed in <think> </think> and <answer> </answer> tags, respectively, i.e., <think> reasoning process here </think> <answer> answer here </answer>.```

SvenJoinH

Jan 21

Can you try the unsloth/DeepSeek-R1-Distill-Qwen-32B-/GGUFDeepSeek-R1-Distill-Qwen-32B-Q4_K_M.gguf and compare the result with bartowski files. I tried the Stellar Blade riddel (a side quest): Can you solve the puzzle with these equations?

( 4 @ 7 @ 8 = 285684 )
( 9 @ 3 @ 5 = 271542 )
( 6 @ 2 @ 7 = 121426 )
( 5 @ 6 @ 7 = ? )

And unsloth seams to solve it but not the
bartowski version, I am just loading them default in LM studio.

bartowski

Owner Jan 21

Can you give the full prompt?

SvenJoinH

Jan 21

The prompt is: Can you solve the puzzle with these equations?

( 4 @ 7 @ 8 = 285684 )
( 9 @ 3 @ 5 = 271542 )
( 6 @ 2 @ 7 = 121426 )
( 5 @ 6 @ 7 = ? )

the unsloth version solved this, i had problems with the bartowski version, but if this correct or not...?

mirek190

Jan 21

•

edited Jan 21

I tried with that prompt unsloth and bartowski model - version q4lm

full command
R1 32b

llama-cli.exe --model DeepSeek-R1-Distill-Qwen-32B-Q4_K_M.gguf --color --threads 30 --keep -1 --n-predict -1 --ctx-size 16384 -ngl 99 --simple-io -e --multiline-input --no-display-prompt --conversation --no-mmap --temp 0.6 -p "A conversation between User and Assistant. The user asks a question, and the Assistant solves it. The assistant first thinks about the reasoning process in the mind and then provides the user with the answer. The reasoning process and answer are enclosed in <think> </think> and <answer> </answer> tags, respectively, i.e., <think> reasoning process here </think> <answer> answer here </answer>."

QwQ

build\bin\llama-cli.exe --model QwQ-32B-Preview-Q4_K_M.gguf --color --threads 30 --keep -1 --n-predict -1 --ctx-size 16384 -ngl 99 --simple-io -e --multiline-input --no-display-prompt --conversation --no-mmap --in-prefix "<|im_end|>\n<|im_start|>user\n" --in-suffix "<|im_end|>\n<|im_start|>assistant\n" -p "<|im_start|>system\nYou are a helpful and harmless assistant. You are Qwen developed by Alibaba. You should think step-by-step."

Example questions:
1

How many days are between 12-12-1971 and 18-4-2024?

answer: 19121

2

Here is a bag filled with popcorn. There is no chocolate in the bag. The bag is made of transparent plastic, so you can see what is inside. Yet, the label on the bag says "chocolate" and not "popcorn". Sam finds the bag. She had never seen the bag before. Sam reads the label. She believes that the bag is full of…

answer : popcorn

3

I have a boat with 3 free spaces. I want to transport a man, sheep and cat on the other side of the river. How to do that?

answer: one run by boat

4

Two fathers and two sons go fishing. They each catch one fish. Together, they leave with four fish in total. Is there anything strange about this story?

answer: nothing special

5

I have a bowl with a small cup inside. I placed the bowl upside down on a table and then pick up the bowl to put it in the microwave. Where is that cup?

answer: on the table

6

Hello! I have multiple different files with different sizes,
I want to move files from disk 1 to disk 2, which has only 688 space available.
Without yapping, and being as concise as possible.
What combination of files gets me closer to that number?
The file sizes are: 36, 36, 49, 53, 54, 54, 63, 94, 94, 107, 164, 201, 361, 478

answer :is the sum numbers which gives exact 688

Example answers QwQ 32b q4km is answering correctly but R1 32b q4km not even one is correct ... very strange ...
I think quantization is broken or they trained at least R1 32b for benchmarks ... but according to their table R1 32b should be far better than QwQ in reasoning or math

SvenJoinH

Jan 21

The HuggingChat with the deepseek-ai/DeepSeek-R1-Distill-Qwen-32B model also failes the "Stellarblade puzzle", (unsloth
gguf works...)

Rotating

Jan 21

I think maybe temperature is a factor, because testing this stellar blade puzzle with top-k=1 it got stuck repeating itself. The temp 0.5-0.6 and top-p 0.95 they recommend might let it get out of that. I'm still testing it but I think we're no longer in Kansas when it comes to this model.

Rotating

Jan 21

Ok so I got Bartowski's Q8 to answer it correctly using temp 0.5 and top-p 0.95

https://pastebin.com/dGe8XWjH

mirek190

Jan 21

•

edited Jan 21

I think I actually found a solution! ... the trick is to force model to thing LONGER like QwQ is thinking longer on problems than R1 32b.

This question I got 5 times proper in a row already ... longest thinking took 9k tokens ..but works ;)

How many days are between 12-12-1971 and 18-4-2024?

Still testing ....

look
https://pastebin.com/UrJu1DgK

EDIT - I was right 10/10 answers are proper now at least for math problems.
More testing ;)

perelmanych

Jan 21

I tested dates question in LM Studio. Q4_K_M always made a mistake calculating 53 full years, plus other math errors. Q6_K always gave me a correct reply. I think going beyond Q6_K is not a good idea for this model. I set temp to 0.09, for my purpose precision is more important than repetition since I can always increase temp if needed.

AaronFeng753

Jan 22

the trick is to force model to think LONGER

Coould you share how did you force it to think longer?

@mirek190

mirek190

Jan 22

•

edited Jan 22

the trick is to force model to think LONGER

Coould you share how did you force it to think longer?

@mirek190

llama-cli.exe --model DeepSeek-R1-Distill-Qwen-32B-Q4_K_M.gguf --color --threads 30 --keep -1 --n-predict -1 --ctx-size 16384 -ngl 99 --simple-io -e --multiline-input --no-display-prompt --conversation --no-mmap --temp 0.6 -p "A conversation between User and Assistant. The user asks a question, and the Assistant solves it. The assistant first thinks about the reasoning process in the mind and then provides the user with the answer. The reasoning process and answer are enclosed in <think> </think> and <answer> </answer> tags, respectively, i.e., <think> reasoning process here </think> <answer> answer here </answer>."

Reasoning and math is better now but still not as good as QwQ

mljxy

Jan 22

It's brittle with prompts. If you just ask the question, even the IQ3_M works fine.

./build/bin/llama-cli -no-cnv --no-escape -m models/DeepSeek-R1-Distill-Qwen-32B-IQ3_M.gguf \
    -t 2 -ngl -1 -fa -c 16384 -nkvo -n -1 --temp 0.1 \
    -p '<｜User｜>How many days are between 12-12-1971 and 18-4-2024?<｜Assistant｜>'

output:

<think>
First, I need to determine the number of days between December 12, 1971, and April 18, 2024. To do this, I'll break down the problem into smaller parts.

I'll start by calculating the number of full years between 1972 and 2023. This gives me 52 years. Among these, I need to identify how many are leap years because they have 366 days instead of the usual 365. Leap years occur every 4 years, so there are 13 leap years in this period.

Next, I'll calculate the total number of days contributed by these years. The 39 common years contribute 39 × 365 = 14,235 days, and the 13 leap years add 13 × 366 = 4,758 days. Adding these together gives 18,993 days.

Now, I'll account for the partial years. From December 12, 1971, to December 31, 1971, there are 20 days. From January 1, 2024, to April 18, 2024, there are 138 days. Adding these partial days to the total gives 18,993 + 20 + 138 = 19,151 days.

Finally, I'll add 1 day to include the end date in the count. This results in a total of 19,152 days between December 12, 1971, and April 18, 2024.
</think>

To determine the number of days between **12-12-1971** and **18-4-2024**, we'll break down the calculation into manageable steps.

### Step 1: Calculate the Number of Full Years
- **Start Year:** 1971
- **End Year:** 2024

The full years between 1972 and 2023 inclusive are:
\[ 2023 - 1972 + 1 = 52 \text{ years} \]

### Step 2: Identify Leap Years
A leap year occurs every 4 years. Between 1972 and 2023:
- **Number of Leap Years:** 
  \[ \left\lfloor \frac{2023 - 1972}{4} \right\rfloor + 1 = 13 \text{ leap years} \]

### Step 3: Calculate Total Days from Full Years
- **Common Years (365 days):** \( 52 - 13 = 39 \) years
- **Leap Years (366 days):** 13 years

\[
\begin{align*}
\text{Days from Common Years} &= 39 \times 365 = 14,235 \text{ days} \\
\text{Days from Leap Years} &= 13 \times 366 = 4,758 \text{ days} \\
\text{Total Days from Full Years} &= 14,235 + 4,758 = 18,993 \text{ days}
\end{align*}
\]

### Step 4: Calculate Days in Partial Years
- **From December 12, 1971, to December 31, 1971:**
  \[
  31 - 12 = 19 \text{ days}
  \]
  
- **From January 1, 2024, to April 18, 2024:**
  \[
  31 \text{ (January)} + 29 \text{ (February)} + 31 \text{ (March)} + 18 \text{ (April)} = 109 \text{ days}
  \]
  
\[
\text{Total Partial Days} = 19 + 109 = 128 \text{ days}
\]

### Step 5: Sum All Days
\[
\begin{align*}
\text{Total Days} &= \text{Days from Full Years} + \text{Partial Days} \\
&= 18,993 + 128 \\
&= 19,121 \text{ days}
\end{align*}
\]

### Final Answer
\[
\boxed{19121}
\]

AaronFeng753

Jan 23

the trick is to force model to think LONGER

Coould you share how did you force it to think longer?

@mirek190

llama-cli.exe --model DeepSeek-R1-Distill-Qwen-32B-Q4_K_M.gguf --color --threads 30 --keep -1 --n-predict -1 --ctx-size 16384 -ngl 99 --simple-io -e --multiline-input --no-display-prompt --conversation --no-mmap --temp 0.6 -p "A conversation between User and Assistant. The user asks a question, and the Assistant solves it. The assistant first thinks about the reasoning process in the mind and then provides the user with the answer. The reasoning process and answer are enclosed in <think> </think> and <answer> </answer> tags, respectively, i.e., <think> reasoning process here </think> <answer> answer here </answer>."

Reasoning and math is better now but still not as good as QwQ

thanks, will try it out

tsalvoch

Jan 25

For information, with the prompt below deepseek-r1:14b (Ollama version / Q4) find the good answer in 19s on my RTX3090.

With this parameters:
Temp=0.5
Reasoning Effort=high (available in Open WebUI)
Top_p=0.9

How many days are between 12-12-1971 and 18-4-2024?
Think step by step.

Without adding "Think step by step" even deepseek-r1:70b (Ollama version / Q4) doesn't find the good answer. Same thing for ChatGPT4-o1 that failed and find 19122 and not 19121.

SvenJoinH

Jan 26

I still don’t understand. Downloaded bartowski/DeepSeek-R1-Distill-Qwen-32B-GGUF/DeepSeek-R1-Distill-Qwen-32B-Q5_K_M.gguf again just to test again. Temperature = 0.6 Got 19,486 days
for the Prompt "How many days are between 12-12-1971 and 18-4-2024?"

Using the unsloth/DeepSeek-R1-Distill-Qwen-32B-GGUF/DeepSeek-R1-Distill-Qwen-32B-Q5_K_M.gguf, With Temperature = 0.6 gives 19,121 days for the Prompt "How many days are between 12-12-1971 and 18-4-2024?"

My very unfounded suspicion is that use of imatrix does something wrong? Or something else is wrong?

bartowski

Owner Jan 26

Hmm that's surprising.. can you try the Q6 on lmstudio? That one doesn't have imatrix applied..

Imatrix ruining it isn't impossible but is unlikely :S it's very hard for it to make things worse

bartowski

Owner Jan 27

For the record with temperature 0.6 on Q5_K_M I get 19,122, which, if it includes the day itself, is a valid answer but obviously still different

What does the official API return?

bartowski

Owner Jan 27

Oh my.. the official chat goes absolutely crazy trying to answer this question.. thought for 89 seconds, and eventually gets 19,121

bartowski

Owner Jan 27

Oh right but we're talking about distill, not R1 🤦‍♂️

urtuuuu

Jan 27

•

edited about 1 month ago

I just tested DeepSeek-R1-Distill-Qwen-𝟏𝟒𝗕-Q5_K_M, and it almost always answers 19121. Also, when reading thought process, there was a funny part "...But I'm not sure about the exact Julian day numbers off the 𝐭𝐨𝐩 𝐨𝐟 𝐦𝐲 𝐡𝐞𝐚𝐝."
I tested those 6 questions locally, 14B R1 not even 32B. And it answered 1,5,6 questions correctly, while 32B R1 on huggingface/chat didn't. Is something wrong with huggingface/chat version of 32B R1?

urtuuuu

about 1 month ago

•

edited about 1 month ago

@mirek190

3

I have a boat with 3 free spaces. I want to transport a man, sheep and cat on the other side of the river. How to do that?

answer: one run by boat

By modifying this question a little bit, even standard qwen2.5-14b-instruct-1M will answer it. >>> I have a boat with 3 free spaces. I want to transport a man, sheep and cat on the other side of the river. How to do that? Before solving, analyse every detail of my question. <<<
But looks like both R1 distilled and phi-4 still can't answer... qwen2.5-14b-instruct-1M is king 😂 It even answered popcorn and bowl/cup questions.
I think I'll just delete r1 distilled and phi-4 lol.
(llama-server -m Qwen2.5-14B-Instruct-1M.Q4_K_M.gguf, no system promt, temp 0.6, nothing else changed)