File size: 1,027 Bytes
0a92f64 19511eb 0a92f64 798e4f9 758d5c4 0a92f64 51f782e 0a92f64 fe38048 c4134b3 0a92f64 51f782e 0a92f64 51f782e c4134b3 51f782e |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 |
---
base_model:
- Qwen/Qwen2.5-3B-Instruct
- PowerInfer/SmallThinker-3B-Preview
datasets:
- PRIME-RL/Eurus-2-RL-Data
language:
- en
pipeline_tag: text-generation
---
# q1-3B-PRIME
**q1-3B-PRIME**, a small reasoning model trained with reinforcement learning.
Trained using SmallThinker-3B-Preview as a base model (Qwen2.5-3B-Instruct full finetuned on QwQ reasoning traces) for a roughly ~22.5% improvement on the test set in 120 training steps. (Note: lots of performance left on the table since PRIME saturates after 300+ steps.)
# Benchmark Performance
## Math
| Model | AIME24 | AMC23 | MATH-500 |
|---------|--------|-------|-------|
| Qwen2.5-3B-Instruct | 6.67 | 45 | - |
| SmallThinker-3B-Preview| 16.667 | 57.5 | - |
| **q1-3B-PRIME** | **26.667** | **67.5** | 64.8 |
| Eurus-7B-PRIME | **26.667** | 57.8 | **79.2** |
| GPT-4o | 9.3 | 45.8 | 76.4 |
## Coding
| Model | HumanEval | Leetcode |
|---------|--------|-------|
| Qwen2.5-3B-Instruct | 74.4 | - |
| **q1-3B-PRIME** | 71.95 | 20.55 |
| GPT-4o | 90.2 | - | |