Themis-7b / README.md
cyk1337's picture
Update README.md
76fc47b verified
|
raw
history blame
1.66 kB
---
datasets:
- baidu/TARA
license: mit
language:
- en
library_name: transformers
---
<a href="https://iclr.cc/Conferences/2024" target="_blank">
<img alt="ICLR 2024" src="https://img.shields.io/badge/Proceedings-ICLR2024-red" />
</a>
Offical checkpoint for [Tool-Augmented Reward Modeling (ICLR 2024 spotlight)](https://openreview.net/pdf?id=d94x0gWTUX).
# Model Description
Themis is a tool-augmented preference model to address these limitations by empowering RMs with access to external environments, including calculators and search engines.
It was introduced in the [ICLR 2024 paper](https://arxiv.org/pdf/2310.01045.pdf) and first released in this [repository](https://github.com/ernie-research/Tool-Augmented-Reward-Model).
Themis-7b is trained with [TARA](https://huggingface.co/datasets/baidu/TARA), achieving a noteworthy overall improvement of 17.7% across eight tasks in preference ranking.
## πŸ”₯ News
* **9 February, 2024:** πŸŽ‰ We release the official codebase and model weights of [`baidu/Themis-7b`](https://huggingface.co/baidu/Themis-7b). Stay tuned!πŸ”₯
* **16 January, 2024:** πŸŽ‰ Our work has been accepted to [ICLR 2024](https://iclr.cc/Conferences/2024) **spotlight**! ✨
# Citation
```text
@inproceedings{tarm-2024-ernie,
author = {Lei Li and
Yekun Chai and
Shuohuan Wang and
Yu Sun and
Hao Tian and
Ningyu Zhang and
Hua Wu},
title = {Tool-Augmented Reward Modeling},
booktitle = {The Twelfth International Conference on Learning Representations (ICLR)},
year = {2024},
url = {https://openreview.net/forum?id=d94x0gWTUX},
}
```