Ba2han commited on
Commit
438fe6f
·
verified ·
1 Parent(s): 445eb0c

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +21 -0
README.md ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ base_model:
5
+ - Qwen/Qwen2.5-7B
6
+ ---
7
+
8
+ # What is This Model?
9
+
10
+ - This model is a "thinking" model focused only on "thinking" without generating an answer or solution.
11
+
12
+ - I believe the proprietary models like Gpt-o1 and Gemini 2.0 Flash Thinking actually work in 2 steps: thinking stage and answer generation. So, the thoughts provide extra context for the model.
13
+
14
+ My argument is that at this point we might have extremely refined open-release models, but they are still bound to user input to generate the next token. Adding some simple, concise and high quality context should improve the output quality. (Too much context could also dilute the attention of smaller models as well)
15
+
16
+ ## Issues:
17
+
18
+ - It still can generate unintended answers.
19
+ - Rarely switches to Chinese.
20
+ - The output can be too long.
21
+ - The training dataset needs more examples and cleaning.