CTRL-32B / README.md
Zhihui's picture
Update README.md
19e505e
|
raw
history blame
593 Bytes
---
license: apache-2.0
---
# CTRL: Critic Training via Reinforcement Learning
CTRL-32B is a critic LLM finetuned from [Qwen2.5-Coder-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct).
- **Project Page:** https://critic-rl.github.io/
- **Paper:** https://arxiv.org/abs/2502.03492
# Citation
```bibtex
@article{xie2025teaching,
title={Teaching Language Models to Critique via Reinforcement Learning},
author={Xie, Zhihui and Chen, Liyu and Mao, Weichao and Xu, Jingjing and Kong, Lingpeng and others},
journal={arXiv preprint arXiv:2502.03492},
year={2025}
}
```