nvidia/Llama-3.1-Nemotron-70B-Reward-HF · Question on tokenizer chat template and readout

Here is what I get for a chat template (when running print(tokenizer.apply_chat_template([{"role": "user", "content": "what is 1+1?"}, {"role": "assistant", "content": "It is 2."}], tokenize=False), add_generation_prompt=False) )

<extra_id_0>System
A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.

<extra_id_1>User
what is 1+1?
<extra_id_1>Assistant
It is 2.
<extra_id_2>

From what I understand, the reward is the predicted output from the very last token. Here, the last token is just > (<extra_id_2> is not a special token in the tokenizer). I would like to double check that this is intended behavior, and if so, ask why your team chose not to use a special token as the last token that the reward is read-out from. Thanks!