Ba2han's picture
Update README.md
8f3f75b verified
|
raw
history blame
993 Bytes
---
language:
- en
base_model:
- Qwen/Qwen2.5-7B
---
# What is This Model?
- This model is a "thinking" model focused only on "thinking" without generating an answer or solution.
- I believe the proprietary models like Gpt-o1 and Gemini 2.0 Flash Thinking actually work in 2 steps: thinking stage and answer generation. So, the thoughts provide extra context for the model.
My argument is that at this point we might have extremely refined open-release models, but they are still bound to user input to generate the next token. Adding some simple, concise and high quality context should improve the output quality. (Too much context could also dilute the attention of smaller models as well)
## Issues:
- It still can generate unintended answers.
- Rarely switches to Chinese.
- The output can be too long.
- The training dataset needs more examples and cleaning.
![image/png](https://cdn-uploads.huggingface.co/production/uploads/6324eabf05bd8a54c6eb1650/1cHpepP0zmML9etvBtUhs.png)