|
--- |
|
license: apache-2.0 |
|
library_name: vllm |
|
pipeline_tag: text-generation |
|
--- |
|
|
|
# CTRL: Critic Training via Reinforcement Learning |
|
CTRL-32B is a critic LLM finetuned from [Qwen2.5-Coder-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct). |
|
|
|
- **Project Page:** https://critic-rl.github.io/ |
|
- **Paper:** https://arxiv.org/abs/2502.03492 |
|
- **Code:** https://github.com/HKUNLP/critic-rl |
|
|
|
## Quickstart |
|
We recommend using [vLLM](https://docs.vllm.ai/en/latest/getting_started/quickstart.html) for inference: |
|
```python |
|
from vllm import LLM, SamplingParams |
|
|
|
def format_prompt_for_ctrl(problem, answer): |
|
"""Given a question-answer pair, we ask the model to generate a critique.""" |
|
return f"""You are tasked with analyzing an answer to a problem and providing constructive feedback. Do NOT provide direct solutions. |
|
|
|
Problem description: |
|
<problem> |
|
{problem} |
|
</problem> |
|
|
|
Answer: |
|
<answer> |
|
{answer} |
|
</answer> |
|
|
|
Structure your response using the following format (without <format> tags): |
|
<format> |
|
Analysis: |
|
{{Analysis}} |
|
|
|
Improvement suggestions: |
|
{{Suggestions}} |
|
|
|
Overall judgment: {{Correct/Incorrect}} |
|
</format>""" |
|
|
|
# Sample prompts. |
|
problem = """Write a python function to check whether every odd index contains odd numbers of a given list.""" |
|
answer = """```python |
|
def odd_length_sum(arr): |
|
n = len(arr) |
|
res = 0 |
|
|
|
# Iterate through each element in the array |
|
for i in range(n): |
|
# Calculate the number of subarrays in which arr[i] is present |
|
count = ((i + 1) * (n - i) + 1) // 2 |
|
|
|
# If the count is odd, add the element to the result |
|
if count % 2 == 1: |
|
res += arr[i] |
|
|
|
return res |
|
```""" |
|
prompts = [ |
|
format_prompt_for_ctrl(problem, answer), |
|
] |
|
# Create a sampling params object. |
|
sampling_params = SamplingParams(temperature=0.7, top_p=0.8, repetition_penalty=1.05, max_tokens=1024) |
|
|
|
# Create an LLM. |
|
llm = LLM(model="Zhihui/CTRL-32B", tensor_parallel_size=2) |
|
# Generate texts from the prompts. The output is a list of RequestOutput objects |
|
# that contain the prompt, generated text, and other information. |
|
outputs = llm.generate(prompts, sampling_params) |
|
# Print the outputs. |
|
for output in outputs: |
|
prompt = output.prompt |
|
generated_text = output.outputs[0].text |
|
print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}") |
|
``` |
|
|
|
## Citation |
|
|
|
```bibtex |
|
@article{xie2025teaching, |
|
title={Teaching Language Models to Critique via Reinforcement Learning}, |
|
author={Xie, Zhihui and Chen, Liyu and Mao, Weichao and Xu, Jingjing and Kong, Lingpeng and others}, |
|
journal={arXiv preprint arXiv:2502.03492}, |
|
year={2025} |
|
} |
|
``` |