q1-3B-PRIME

q1-3B-PRIME, a small reasoning model trained with reinforcement learning.

Trained using SmallThinker-3B-Preview as a base model (Qwen2.5-3B-Instruct full finetuned on QwQ reasoning traces) for a roughly ~22.5% improvement on the test set in 120 training steps. (Note: lots of performance left on the table since PRIME saturates after 300+ steps.)

Benchmark Performance

Math

Model AIME24 AMC23 MATH-500
Qwen2.5-3B-Instruct 6.67 45 -
SmallThinker-3B-Preview 16.667 57.5 -
q1-3B-PRIME 26.667 67.5 64.8
Eurus-7B-PRIME 26.667 57.8 79.2
GPT-4o 9.3 45.8 76.4

Coding

Model HumanEval Leetcode
Qwen2.5-3B-Instruct 74.4 -
q1-3B-PRIME 71.95 20.55
GPT-4o 90.2 -
Downloads last month
468
Safetensors
Model size
3.4B params
Tensor type
F32
·
Inference Examples
Unable to determine this model's library. Check the docs .

Model tree for rawsh/q1-3B-PRIME

Base model

Qwen/Qwen2.5-3B
Finetuned
(4)
this model
Quantizations
1 model

Dataset used to train rawsh/q1-3B-PRIME