Zhihui
/

CTRL-32B

Text Generation

Model card Files Files and versions Community

CTRL-32B / README.md

Zhihui's picture

Add library_name and pipeline_tag metadata (#1)

bd04f7a verified 14 days ago

|

history blame contribute delete

2.61 kB

	---
	license: apache-2.0
	library_name: vllm
	pipeline_tag: text-generation
	---

	# CTRL: Critic Training via Reinforcement Learning
	CTRL-32B is a critic LLM finetuned from [Qwen2.5-Coder-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct).

	- Project Page: https://critic-rl.github.io/
	- Paper: https://arxiv.org/abs/2502.03492
	- Code: https://github.com/HKUNLP/critic-rl

	## Quickstart
	We recommend using [vLLM](https://docs.vllm.ai/en/latest/getting_started/quickstart.html) for inference:
	```python
	from vllm import LLM, SamplingParams

	def format_prompt_for_ctrl(problem, answer):
	"""Given a question-answer pair, we ask the model to generate a critique."""
	return f"""You are tasked with analyzing an answer to a problem and providing constructive feedback. Do NOT provide direct solutions.

	Problem description:
	<problem>
	{problem}
	</problem>

	Answer:
	<answer>
	{answer}
	</answer>

	Structure your response using the following format (without <format> tags):
	<format>
	Analysis:
	{{Analysis}}

	Improvement suggestions:
	{{Suggestions}}

	Overall judgment: {{Correct/Incorrect}}
	</format>"""

	# Sample prompts.
	problem = """Write a python function to check whether every odd index contains odd numbers of a given list."""
	answer = """```python
	def odd_length_sum(arr):
	n = len(arr)
	res = 0

	# Iterate through each element in the array
	for i in range(n):
	# Calculate the number of subarrays in which arr[i] is present
	count = ((i + 1) * (n - i) + 1) // 2

	# If the count is odd, add the element to the result
	if count % 2 == 1:
	res += arr[i]

	return res
	```"""
	prompts = [
	format_prompt_for_ctrl(problem, answer),
	]
	# Create a sampling params object.
	sampling_params = SamplingParams(temperature=0.7, top_p=0.8, repetition_penalty=1.05, max_tokens=1024)

	# Create an LLM.
	llm = LLM(model="Zhihui/CTRL-32B", tensor_parallel_size=2)
	# Generate texts from the prompts. The output is a list of RequestOutput objects
	# that contain the prompt, generated text, and other information.
	outputs = llm.generate(prompts, sampling_params)
	# Print the outputs.
	for output in outputs:
	prompt = output.prompt
	generated_text = output.outputs[0].text
	print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
	```

	## Citation

	```bibtex
	@article{xie2025teaching,
	title={Teaching Language Models to Critique via Reinforcement Learning},
	author={Xie, Zhihui and Chen, Liyu and Mao, Weichao and Xu, Jingjing and Kong, Lingpeng and others},
	journal={arXiv preprint arXiv:2502.03492},
	year={2025}
	}
	```