R1 32b is much worse than QwQ ...
Do you experience that?
I can confirm is bad if you compare to QwQ ...
The system prompt that's floating around seems to help a lot (I transcribed it from a screenshot):
A conversation between User and Assistant. The user asks a question, and the Assistant solves it. The assistant first thinks about the reasoning process in the mind and then provides the user with the answer. The reasoning process and answer are enclosed in <think> </think> and <answer> </answer> tags, respectively, i.e., <think> reasoning process here </think> <answer> answer here </answer>.```
Can you try the unsloth/DeepSeek-R1-Distill-Qwen-32B-/GGUFDeepSeek-R1-Distill-Qwen-32B-Q4_K_M.gguf and compare the result with bartowski files. I tried the Stellar Blade riddel (a side quest): Can you solve the puzzle with these equations?
( 4 @ 7 @ 8 = 285684 )
( 9 @ 3 @ 5 = 271542 )
( 6 @ 2 @ 7 = 121426 )
( 5 @ 6 @ 7 = ? )
And unsloth seams to solve it but not the
bartowski version, I am just loading them default in LM studio.
Can you give the full prompt?
The prompt is: Can you solve the puzzle with these equations?
( 4 @ 7 @ 8 = 285684 )
( 9 @ 3 @ 5 = 271542 )
( 6 @ 2 @ 7 = 121426 )
( 5 @ 6 @ 7 = ? )
the unsloth version solved this, i had problems with the bartowski version, but if this correct or not...?
I tried with that prompt unsloth and bartowski model - version q4lm
full command
R1 32b
llama-cli.exe --model DeepSeek-R1-Distill-Qwen-32B-Q4_K_M.gguf --color --threads 30 --keep -1 --n-predict -1 --ctx-size 16384 -ngl 99 --simple-io -e --multiline-input --no-display-prompt --conversation --no-mmap --temp 0.6 -p "A conversation between User and Assistant. The user asks a question, and the Assistant solves it. The assistant first thinks about the reasoning process in the mind and then provides the user with the answer. The reasoning process and answer are enclosed in <think> </think> and <answer> </answer> tags, respectively, i.e., <think> reasoning process here </think> <answer> answer here </answer>."
QwQ
build\bin\llama-cli.exe --model QwQ-32B-Preview-Q4_K_M.gguf --color --threads 30 --keep -1 --n-predict -1 --ctx-size 16384 -ngl 99 --simple-io -e --multiline-input --no-display-prompt --conversation --no-mmap --in-prefix "<|im_end|>\n<|im_start|>user\n" --in-suffix "<|im_end|>\n<|im_start|>assistant\n" -p "<|im_start|>system\nYou are a helpful and harmless assistant. You are Qwen developed by Alibaba. You should think step-by-step."
Example questions:
1
How many days are between 12-12-1971 and 18-4-2024?
answer: 19121
2
Here is a bag filled with popcorn. There is no chocolate in the bag. The bag is made of transparent plastic, so you can see what is inside. Yet, the label on the bag says "chocolate" and not "popcorn". Sam finds the bag. She had never seen the bag before. Sam reads the label. She believes that the bag is full of…
answer : popcorn
3
I have a boat with 3 free spaces. I want to transport a man, sheep and cat on the other side of the river. How to do that?
answer: one run by boat
4
Two fathers and two sons go fishing. They each catch one fish. Together, they leave with four fish in total. Is there anything strange about this story?
answer: nothing special
5
I have a bowl with a small cup inside. I placed the bowl upside down on a table and then pick up the bowl to put it in the microwave. Where is that cup?
answer: on the table
6
Hello! I have multiple different files with different sizes,
I want to move files from disk 1 to disk 2, which has only 688 space available.
Without yapping, and being as concise as possible.
What combination of files gets me closer to that number?
The file sizes are: 36, 36, 49, 53, 54, 54, 63, 94, 94, 107, 164, 201, 361, 478
answer :is the sum numbers which gives exact 688
Example answers QwQ 32b q4km is answering correctly but R1 32b q4km not even one is correct ... very strange ...
I think quantization is broken or they trained at least R1 32b for benchmarks ... but according to their table R1 32b should be far better than QwQ in reasoning or math
The HuggingChat with the deepseek-ai/DeepSeek-R1-Distill-Qwen-32B model also failes the "Stellarblade puzzle", (unsloth
gguf works...)
I think maybe temperature is a factor, because testing this stellar blade puzzle with top-k=1 it got stuck repeating itself. The temp 0.5-0.6 and top-p 0.95 they recommend might let it get out of that. I'm still testing it but I think we're no longer in Kansas when it comes to this model.
Ok so I got Bartowski's Q8 to answer it correctly using temp 0.5 and top-p 0.95
I think I actually found a solution! ... the trick is to force model to thing LONGER like QwQ is thinking longer on problems than R1 32b.
This question I got 5 times proper in a row already ... longest thinking took 9k tokens ..but works ;)
How many days are between 12-12-1971 and 18-4-2024?
Still testing ....
look
https://pastebin.com/UrJu1DgK
EDIT - I was right 10/10 answers are proper now at least for math problems.
More testing ;)
I tested dates question in LM Studio. Q4_K_M always made a mistake calculating 53 full years, plus other math errors. Q6_K always gave me a correct reply. I think going beyond Q6_K is not a good idea for this model. I set temp to 0.09, for my purpose precision is more important than repetition since I can always increase temp if needed.
the trick is to force model to think LONGER
Coould you share how did you force it to think longer?
the trick is to force model to think LONGER
Coould you share how did you force it to think longer?
llama-cli.exe --model DeepSeek-R1-Distill-Qwen-32B-Q4_K_M.gguf --color --threads 30 --keep -1 --n-predict -1 --ctx-size 16384 -ngl 99 --simple-io -e --multiline-input --no-display-prompt --conversation --no-mmap --temp 0.6 -p "A conversation between User and Assistant. The user asks a question, and the Assistant solves it. The assistant first thinks about the reasoning process in the mind and then provides the user with the answer. The reasoning process and answer are enclosed in <think> </think> and <answer> </answer> tags, respectively, i.e., <think> reasoning process here </think> <answer> answer here </answer>."
Reasoning and math is better now but still not as good as QwQ
It's brittle with prompts. If you just ask the question, even the IQ3_M works fine.
./build/bin/llama-cli -no-cnv --no-escape -m models/DeepSeek-R1-Distill-Qwen-32B-IQ3_M.gguf \
-t 2 -ngl -1 -fa -c 16384 -nkvo -n -1 --temp 0.1 \
-p '<|User|>How many days are between 12-12-1971 and 18-4-2024?<|Assistant|>'
output:
<think>
First, I need to determine the number of days between December 12, 1971, and April 18, 2024. To do this, I'll break down the problem into smaller parts.
I'll start by calculating the number of full years between 1972 and 2023. This gives me 52 years. Among these, I need to identify how many are leap years because they have 366 days instead of the usual 365. Leap years occur every 4 years, so there are 13 leap years in this period.
Next, I'll calculate the total number of days contributed by these years. The 39 common years contribute 39 × 365 = 14,235 days, and the 13 leap years add 13 × 366 = 4,758 days. Adding these together gives 18,993 days.
Now, I'll account for the partial years. From December 12, 1971, to December 31, 1971, there are 20 days. From January 1, 2024, to April 18, 2024, there are 138 days. Adding these partial days to the total gives 18,993 + 20 + 138 = 19,151 days.
Finally, I'll add 1 day to include the end date in the count. This results in a total of 19,152 days between December 12, 1971, and April 18, 2024.
</think>
To determine the number of days between **12-12-1971** and **18-4-2024**, we'll break down the calculation into manageable steps.
### Step 1: Calculate the Number of Full Years
- **Start Year:** 1971
- **End Year:** 2024
The full years between 1972 and 2023 inclusive are:
\[ 2023 - 1972 + 1 = 52 \text{ years} \]
### Step 2: Identify Leap Years
A leap year occurs every 4 years. Between 1972 and 2023:
- **Number of Leap Years:**
\[ \left\lfloor \frac{2023 - 1972}{4} \right\rfloor + 1 = 13 \text{ leap years} \]
### Step 3: Calculate Total Days from Full Years
- **Common Years (365 days):** \( 52 - 13 = 39 \) years
- **Leap Years (366 days):** 13 years
\[
\begin{align*}
\text{Days from Common Years} &= 39 \times 365 = 14,235 \text{ days} \\
\text{Days from Leap Years} &= 13 \times 366 = 4,758 \text{ days} \\
\text{Total Days from Full Years} &= 14,235 + 4,758 = 18,993 \text{ days}
\end{align*}
\]
### Step 4: Calculate Days in Partial Years
- **From December 12, 1971, to December 31, 1971:**
\[
31 - 12 = 19 \text{ days}
\]
- **From January 1, 2024, to April 18, 2024:**
\[
31 \text{ (January)} + 29 \text{ (February)} + 31 \text{ (March)} + 18 \text{ (April)} = 109 \text{ days}
\]
\[
\text{Total Partial Days} = 19 + 109 = 128 \text{ days}
\]
### Step 5: Sum All Days
\[
\begin{align*}
\text{Total Days} &= \text{Days from Full Years} + \text{Partial Days} \\
&= 18,993 + 128 \\
&= 19,121 \text{ days}
\end{align*}
\]
### Final Answer
\[
\boxed{19121}
\]
the trick is to force model to think LONGER
Coould you share how did you force it to think longer?
llama-cli.exe --model DeepSeek-R1-Distill-Qwen-32B-Q4_K_M.gguf --color --threads 30 --keep -1 --n-predict -1 --ctx-size 16384 -ngl 99 --simple-io -e --multiline-input --no-display-prompt --conversation --no-mmap --temp 0.6 -p "A conversation between User and Assistant. The user asks a question, and the Assistant solves it. The assistant first thinks about the reasoning process in the mind and then provides the user with the answer. The reasoning process and answer are enclosed in <think> </think> and <answer> </answer> tags, respectively, i.e., <think> reasoning process here </think> <answer> answer here </answer>."
Reasoning and math is better now but still not as good as QwQ
thanks, will try it out
For information, with the prompt below deepseek-r1:14b (Ollama version / Q4) find the good answer in 19s on my RTX3090.
With this parameters:
Temp=0.5
Reasoning Effort=high (available in Open WebUI)
Top_p=0.9
How many days are between 12-12-1971 and 18-4-2024?
Think step by step.
Without adding "Think step by step" even deepseek-r1:70b (Ollama version / Q4) doesn't find the good answer. Same thing for ChatGPT4-o1 that failed and find 19122 and not 19121.
I still don’t understand. Downloaded bartowski/DeepSeek-R1-Distill-Qwen-32B-GGUF/DeepSeek-R1-Distill-Qwen-32B-Q5_K_M.gguf again just to test again. Temperature = 0.6 Got 19,486 days
for the Prompt "How many days are between 12-12-1971 and 18-4-2024?"
Using the unsloth/DeepSeek-R1-Distill-Qwen-32B-GGUF/DeepSeek-R1-Distill-Qwen-32B-Q5_K_M.gguf, With Temperature = 0.6 gives 19,121 days for the Prompt "How many days are between 12-12-1971 and 18-4-2024?"
My very unfounded suspicion is that use of imatrix does something wrong? Or something else is wrong?
Hmm that's surprising.. can you try the Q6 on lmstudio? That one doesn't have imatrix applied..
Imatrix ruining it isn't impossible but is unlikely :S it's very hard for it to make things worse
For the record with temperature 0.6 on Q5_K_M I get 19,122, which, if it includes the day itself, is a valid answer but obviously still different
What does the official API return?
Oh my.. the official chat goes absolutely crazy trying to answer this question.. thought for 89 seconds, and eventually gets 19,121
Oh right but we're talking about distill, not R1 🤦♂️
I just tested DeepSeek-R1-Distill-Qwen-𝟏𝟒𝗕-Q5_K_M, and it almost always answers 19121. Also, when reading thought process, there was a funny part "...But I'm not sure about the exact Julian day numbers off the 𝐭𝐨𝐩 𝐨𝐟 𝐦𝐲 𝐡𝐞𝐚𝐝."
Btw, second question about popcorn, even gpt4o answers "chocolate". But this question is more for humans than AI, because we have eyes :)