CTRL-32B / README.md
Zhihui's picture
Update README.md
19e505e
|
raw
history blame
593 Bytes
metadata
license: apache-2.0

CTRL: Critic Training via Reinforcement Learning

CTRL-32B is a critic LLM finetuned from Qwen2.5-Coder-32B-Instruct.

Citation

@article{xie2025teaching,
  title={Teaching Language Models to Critique via Reinforcement Learning},
  author={Xie, Zhihui and Chen, Liyu and Mao, Weichao and Xu, Jingjing and Kong, Lingpeng and others},
  journal={arXiv preprint arXiv:2502.03492},
  year={2025}
}