DNA-R1

We introduce DNA-R1, a specialized reasoning model optimized for Korean language based on Microsoft's Phi-4. By applying large-scale reinforcement learning (RL) using the same methodology as DeepSeek-R1, we have significantly enhanced the model's Korean reasoning capabilities. This model demonstrates deep understanding of Korean text and exhibits exceptional reasoning abilities across mathematics, coding, and general reasoning tasks.

Training Methodology

Our comprehensive training pipeline consists of three strategic stages:

  • Stage 1: Initial SFT with a large Korean non-reasoning dataset (760k examples) reused from our DNA 1.0 8B Instruct training pipeline
  • Stage 2: Strategic integration of Korean reasoning patterns from DeepSeek R1 using a specialized Korean reasoning dataset (300k examples)
  • Stage 3: Advanced reinforcement learning with GRPO using a combined Korean/English reasoning dataset, with format, accuracy, and language consistency as rewards

DNA-R1 has learned reasoning patterns specifically tailored for Korean language, and demonstrates capabilities such as self-verification, reflection, and generation of long chains-of-thought (CoT). This represents a significant milestone for the AI research community in the Korean language environment.

Model Specifications

  • Developed by: Dnotitia Inc.
  • Supported Languages: Korean, English
  • Model Release Date: Mar 6, 2025
  • Number of Parameters: 14B
  • License: CC BY-NC 4.0

NOTICE (Korean):

๋ณธ ๋ชจ๋ธ์€ ์ƒ์—…์  ๋ชฉ์ ์œผ๋กœ ํ™œ์šฉํ•˜์‹ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ƒ์—…์  ์ด์šฉ์„ ์›ํ•˜์‹œ๋Š” ๊ฒฝ์šฐ, ๋””๋…ธํ‹ฐ์‹œ์•„ ํ™ˆํŽ˜์ด์ง€์˜ Contact us๋ฅผ ํ†ตํ•ด ๋ฌธ์˜ํ•ด ์ฃผ์‹œ๊ธฐ ๋ฐ”๋ž๋‹ˆ๋‹ค. ๊ฐ„๋‹จํ•œ ํ˜‘์˜ ์ ˆ์ฐจ๋ฅผ ๊ฑฐ์ณ ์ƒ์—…์  ํ™œ์šฉ์„ ์Šน์ธํ•ด ๋“œ๋ฆฌ๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

Technical Details

Multi-Stage Training Pipeline

We implemented a sophisticated training approach to enhance Phi-4's Korean reasoning capabilities:

  1. Initial Foundation (Stage 1): Supervised Fine-Tuning using our extensive Korean non-reasoning dataset from the established DNA 1.0 8B Instruct training pipeline
  2. Reasoning Integration (Stage 2): Specialized adaptation of DeepSeek R1's reasoning patterns with Korean-specific optimization through a meticulously curated dataset
  3. Advanced Refinement (Stage 3): Reinforcement learning optimization using GRPO to perfect reasoning in both Korean and English, with comprehensive reward signals for format structure, factual accuracy, and language consistency

This methodical approach enables DNA-R1 to develop sophisticated chain-of-thought (CoT) reasoning for complex problem solving, resulting in a model finely calibrated for Korean language reasoning while maintaining robust general capabilities.

Performance Highlights

Our Korean-specific multi-stage training pipeline significantly enhances the Phi-4 base model's understanding of Korean context, reasoning depth, and response capabilities. The model excels at:

  • Generating nuanced Korean chains-of-thought (CoT)
  • Performing rigorous self-verification
  • Solving multi-step complex problems
  • Maintaining cultural and linguistic context in reasoning
  • Distinguishing between deep thinking and concise answers using the <think> and <answer> tags

Evaluation Results

Below, we present our evaluation results for the DNA-R1 model across math, coding, science, Korean, and general-performance benchmarks. Despite being only 14B in size, the DNA-R1 model demonstrates superior performance compared to many larger models across various benchmarks.

Benchmark Task DNA-R1 (14B) DeepSeek-R1-Distill-Qwen-14B DeepSeek-R1-Distill-Qwen-32B EXAONE-3.5-32B-Instruct QwQ-32B-Preview gpt-4o-0513 o1-mini o1-preview
GSM8K Math 92.49 88.63 82.64 91.9 82.41 - - -
Math500 89.4 88.2 87.4 75.8 92.2 75.8 85.6 81.4
AIME2024 53.3 69.7 72.6 6.67 50.0 8.6 64.0 40
OlympiadBench (Math, EN) 59.94 56.82 55.34 38.58 62.17 - - 59.2
GPQA-Diamond Science/Reasoning 61.11 59.1 58.08 33.33 52.5 46.5 60 75.2
LiveCodeBench Coding 50.58 59.88 61.65 19.8 59.12 50.48 72.75 59.14
KMMLU-direct Korean 59.9 50.5 58.62 50.72 62.96 - - -
KMMLU-hard 36.65 25.34 33.67 25.46 37.98 - - -
KoBEST 83.05 74.32 78.53 86.54 85.93 - - -
MMLU-Pro General 57.64 50.55 59.58 - 46.82 - - -
  • The highest scores are in bold form, and the second-highest scores are underlined.
  • All benchmarks are evaluated with lm-eval and skythought-eval.

Quickstart

from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer

tokenizer = AutoTokenizer.from_pretrained('dnotitia/DNA-R1')
model = AutoModelForCausalLM.from_pretrained('dnotitia/DNA-R1', device_map='auto')
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)

conversation = [
    {"role": "user", "content": """
์–ด๋ ค์„œ๋ถ€ํ„ฐ ์šฐ๋ฆฌ ์ง‘์€ ๊ฐ€๋‚œํ–ˆ์—ˆ๊ณ 
๋‚จ๋“ค ๋‹คํ•˜๋Š” ์™ธ์‹ ๋ช‡ ๋ฒˆ ํ•œ ์ ์ด ์—†์—ˆ๊ณ 
์ผํ„ฐ์— ๋‚˜๊ฐ€์‹  ์–ด๋จธ๋‹ˆ ์ง‘์— ์—†์œผ๋ฉด
์–ธ์ œ๋‚˜ ํ˜ผ์ž์„œ ๋“์—ฌ ๋จน์—ˆ๋˜ ๋ผ๋ฉด
๊ทธ๋Ÿฌ๋‹ค ๋ผ๋ฉด์ด ๋„ˆ๋ฌด ์ง€๊ฒจ์›Œ์„œ
๋ง›์žˆ๋Š” ๊ฒƒ ์ข€ ๋จน์ž๊ณ  ๋Œ€๋“ค์—ˆ์—ˆ์–ด
๊ทธ๋Ÿฌ์ž ์–ด๋จธ๋‹˜์ด ๋งˆ์ง€๋ชปํ•ด ๊บผ๋‚ด์‹ 
์ˆจ๊ฒจ๋‘์‹  ๋น„์ƒ๊ธˆ์œผ๋กœ ์‹œ์ผœ์ฃผ์‹ 
์งœ์žฅ๋ฉด ํ•˜๋‚˜์— ๋„ˆ๋ฌด๋‚˜ ํ–‰๋ณตํ–ˆ์—ˆ์–ด
ํ•˜์ง€๋งŒ ์–ด๋จธ๋‹˜์€ ์™ ์ง€ ๋“œ์‹œ์งˆ ์•Š์•˜์–ด
์–ด๋จธ๋‹˜์€ ์งœ์žฅ๋ฉด์ด ์‹ซ๋‹ค๊ณ  ํ•˜์…จ์–ด
์–ด๋จธ๋‹˜์€ ์งœ์žฅ๋ฉด์ด ์‹ซ๋‹ค๊ณ  ํ•˜์…จ์–ด
์•ผ์ด์•ผ~์•ผ ๊ทธ๋ ‡๊ฒŒ ์‚ด์•„๊ฐ€๊ณ 
๊ทธ๋ ‡๊ฒŒ ํ›„ํšŒํ•˜๊ณ  ๋ˆˆ๋ฌผ๋„ ํ˜๋ฆฌ๊ณ 
์•ผ์ด์•ผ~์•ผ ๊ทธ๋ ‡๊ฒŒ ์‚ด์•„๊ฐ€๊ณ 
๋„ˆ๋ฌด๋‚˜ ์•„ํ”„๊ณ  ํ•˜์ง€๋งŒ ๋‹ค์‹œ ์›ƒ๊ณ 
---
์นœ๊ตฌ๊ฐ€ ์“ด ์‹œ์ธ๋ฐ, ์—ฌ๊ธฐ์„œ ์นœ๊ตฌ์˜ ์–ด๋จธ๋‹ˆ๊ฐ€ ์งœ์žฅ๋ฉด์ด ์‹ซ๋‹ค๊ณ  ํ•˜์‹  ์ด์œ ๋Š”?"""},
]
inputs = tokenizer.apply_chat_template(conversation,
                                       add_generation_prompt=True,
                                       return_dict=True,
                                       return_tensors="pt").to(model.device)
_ = model.generate(**inputs, streamer=streamer)

License

This model is released under CC BY-NC 4.0 license. If you have any questions or commercial usage inquiries, please Contact us.

Citation

If you use or discuss this model in your academic research, please cite the project to help spread awareness:

@misc{dnar12025,
      title={DNA R1}, 
      author={Jungyup Lee and Jemin Kim and Sang Park and SeungJae Lee},
      year={2025},
      publisher={HuggingFace},
      url={https://huggingface.co/dnotitia/DNA-R1}
}
Downloads last month
443
Safetensors
Model size
14.7B params
Tensor type
BF16
ยท
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for dnotitia/DNA-R1

Base model

microsoft/phi-4
Finetuned
(58)
this model
Quantizations
7 models

Collection including dnotitia/DNA-R1