|
--- |
|
license: apache-2.0 |
|
--- |
|
|
|
# CTRL: Critic Training via Reinforcement Learning |
|
CTRL-32B is a critic LLM finetuned from [Qwen2.5-Coder-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct). |
|
|
|
- **Project Page:** https://critic-rl.github.io/ |
|
- **Paper:** https://arxiv.org/abs/2502.03492 |
|
|
|
# Citation |
|
|
|
```bibtex |
|
@article{xie2025teaching, |
|
title={Teaching Language Models to Critique via Reinforcement Learning}, |
|
author={Xie, Zhihui and Chen, Liyu and Mao, Weichao and Xu, Jingjing and Kong, Lingpeng and others}, |
|
journal={arXiv preprint arXiv:2502.03492}, |
|
year={2025} |
|
} |
|
``` |