rawsh
/

q1-3B-PRIME

rawsh commited on 2 days ago

Commit

758d5c4

verified ·

1 Parent(s): 0a92f64

Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -12,6 +12,8 @@ pipeline_tag: text-generation
 **q1-3B-PRIME**, a small reasoning model trained with reinforcement learning.
 # Benchmark Performance
 Math

 **q1-3B-PRIME**, a small reasoning model trained with reinforcement learning.
+Trained using SmallThinker-3B-Preview as a base model (Qwen2.5-3B-Instruct full finetuned on QwQ reasoning traces) for a roughly ~22.5% improvement on the test set in 120 training steps. (Note: lots of performance left on the table since PRIME saturates after 300 steps.)
 # Benchmark Performance
 Math