question about output length

#59
by skylerr - opened

I set the max_new_tokens=2048, and evaluate the model in math 500, but some responses cannot be completed, compared to qwen 2.5, I find the response now is too long, what can I do to cut down the generated answers?

I am facing a similar issue in evaluating that model. I observe that all those reasoning models (o1, o3, deepseek, qwq) suffer from very long generations to reach EOS. I believe it is part of how they were trained to reason and we cannot do anything about it at the moment besides let it generate many tokens...

Sign up or log in to comment