|
--- |
|
datasets: |
|
- baidu/TARA |
|
license: mit |
|
language: |
|
- en |
|
library_name: transformers |
|
--- |
|
|
|
|
|
<a href="https://iclr.cc/Conferences/2024" target="_blank"> |
|
<img alt="ICLR 2024" src="https://img.shields.io/badge/Proceedings-ICLR2024-red" /> |
|
</a> |
|
|
|
Offical checkpoint for [Tool-Augmented Reward Modeling (ICLR 2024 spotlight)](https://openreview.net/pdf?id=d94x0gWTUX). |
|
|
|
|
|
|
|
# Model Description |
|
|
|
Themis is a tool-augmented preference model to address these limitations by empowering RMs with access to external environments, including calculators and search engines. |
|
It was introduced in the [ICLR 2024 paper](https://arxiv.org/pdf/2310.01045.pdf) and first released in this [repository](https://github.com/ernie-research/Tool-Augmented-Reward-Model). |
|
Themis-7b is trained with [TARA](https://huggingface.co/datasets/baidu/TARA), achieving a noteworthy overall improvement of 17.7% across eight tasks in preference ranking. |
|
|
|
## π₯ News |
|
* **9 February, 2024:** π We release the official codebase and model weights of [`baidu/Themis-7b`](https://huggingface.co/baidu/Themis-7b). Stay tuned!π₯ |
|
* **16 January, 2024:** π Our work has been accepted to [ICLR 2024](https://iclr.cc/Conferences/2024) **spotlight**! β¨ |
|
|
|
|
|
# Citation |
|
```text |
|
@inproceedings{tarm-2024-ernie, |
|
author = {Lei Li and |
|
Yekun Chai and |
|
Shuohuan Wang and |
|
Yu Sun and |
|
Hao Tian and |
|
Ningyu Zhang and |
|
Hua Wu}, |
|
title = {Tool-Augmented Reward Modeling}, |
|
booktitle = {The Twelfth International Conference on Learning Representations (ICLR)}, |
|
year = {2024}, |
|
url = {https://openreview.net/forum?id=d94x0gWTUX}, |
|
} |
|
``` |