rawsh commited on
Commit
758d5c4
·
verified ·
1 Parent(s): 0a92f64

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -0
README.md CHANGED
@@ -12,6 +12,8 @@ pipeline_tag: text-generation
12
 
13
  **q1-3B-PRIME**, a small reasoning model trained with reinforcement learning.
14
 
 
 
15
  # Benchmark Performance
16
 
17
  Math
 
12
 
13
  **q1-3B-PRIME**, a small reasoning model trained with reinforcement learning.
14
 
15
+ Trained using SmallThinker-3B-Preview as a base model (Qwen2.5-3B-Instruct full finetuned on QwQ reasoning traces) for a roughly ~22.5% improvement on the test set in 120 training steps. (Note: lots of performance left on the table since PRIME saturates after 300 steps.)
16
+
17
  # Benchmark Performance
18
 
19
  Math