File size: 940 Bytes
0a92f64
 
 
 
 
 
 
 
 
 
 
 
 
 
798e4f9
758d5c4
0a92f64
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
---
base_model:
- Qwen/Qwen2.5-3B-Instruct
datasets:
- PRIME-RL/Eurus-2-RL-Data
language:
- en
pipeline_tag: text-generation
---

# q1-3B-PRIME

**q1-3B-PRIME**, a small reasoning model trained with reinforcement learning.

Trained using SmallThinker-3B-Preview as a base model (Qwen2.5-3B-Instruct full finetuned on QwQ reasoning traces) for a roughly ~22.5% improvement on the test set in 120 training steps. (Note: lots of performance left on the table since PRIME saturates after 300+ steps.)

# Benchmark Performance

Math

| Model | AIME24 | AMC23 | MATH-500 |
|---------|--------|-------|-------|
| Qwen2.5-3B-Instruct | 6.67 | 45 | - |
| **q1-3B-PRIME** | **26.667** | **65** | - |
| SmallThinker-3B-Preview| 16.667 | 57.5 | - |
| GPT-4o | 9.3 | 45.8 | 76.4 |

Coding

| Model | HumanEval | Leetcode |
|---------|--------|-------|
| Qwen2.5-3B-Instruct | **74.4** | - |
| **q1-3B-PRIME** | 71.95 | **20.55** |
| GPT-4o | 90.2 | - |