Safetensors
qwen2

Task Description

The task involves training a model to evaluate two pieces of text. One of the texts has been subtly augmented by a LLM (specifically, the larger 14B variant of my corruption models). The model must provide notes and a subsequent judgment afterwards in consecutive XML tags.

GRPO Task Overview

Example Format

The base model is provided a system prompt that establishes the expected template, as well as two randomly ordered A/B samples containing "real" vs "synthetic" samples for the input:

REQUEST: You are to judge the better of the two samples and determine which of the following samples is better using a short judgement that is no longer than (and no shorter than) exactly 128 tokens.

Respond with an exactly 128 tokens tag labeled <notes> that contains your notes, and then <judgement> which is just the letter that you are picking.

For example:

JUDGE: <notes>
Sample A is superior to Sample B... (example notes)
</notes>
<judgement>A</judgement>

Now, it is your turn.

[Sample A]:
Included is a pre-test, post-test, and vocabulary quiz on the 8th grade math standard functions (8.F). 1.) Determine if a graph represents a function 2.) State the domain and range of a relation 3.) Plot points on a graph to determine if the table represents a function 4.) State if a function is decreasing, increasing, or constant 5.) Determine the output of a function machine 6.) Determine the recursive and explicit equation 7.) Determine the minimum, maximum, increasing interval, and decreasing interval of a graph 8.) Determine the rate of change, initial value, independent value, and dependent variable given a graph 9.) Sketch a graph given a situation The vocabulary included is dependent, output, function, domain, range, decreasing function, input, range, non-linear function, relation, increasing function, and function notation. Total Pages: 9 (18 including answer key) Answer Key: Included Document File: PDF

[Sample B]:
Included is a pre-test, post-test, and vocabulary quiz on the 8th grade math standard functions (8.F). 1.) Determine if a graph represents a function 2.) State the domain and range of a relation 3.) Plot points on a graph to determine if the table represents a function 4.) State if a function is increasing, decreasing, or constant 5.) Determine the output of a function given 6.) Determine the input of a function given 7.) Determine a function rule given ordered pairs or a table of values. 8.) Graph functions using a table of values and determine a trend line in a graph 9.) Write a data table situation The vocabulary included is dependent, output, function, domain, range, decreasing function, input, range, non-linear function, relation, increasing function, and function notation. Total Pages: 9 (18 including answer key) Answer Key: Included Document File: PDF

JUDGE:

A correct output follows this structure:

<notes>
Sample A provides more specific and thoroughly defined tasks. It mentions "function machine," "recursive and explicit equation," and detailed graph analysis with "minimum, maximum" and intervals. Sample B contains incomplete phrases like "output of a function given" without completing the thought, making it less coherent and precise than Sample A.
</notes>
<judgement>A</judgement>
Downloads last month
8
Safetensors
Model size
14.8B params
Tensor type
BF16
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for Quest-AI/quest-grpo-judge-14b-v1-205

Base model

Qwen/Qwen2.5-14B
Finetuned
(51)
this model

Dataset used to train Quest-AI/quest-grpo-judge-14b-v1-205