baidu
/

Themis-7b

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Themis-7b / README.md

cyk1337's picture

Update README.md

76fc47b verified about 1 year ago

|

1.66 kB

	---
	datasets:
	- baidu/TARA
	license: mit
	language:
	- en
	library_name: transformers
	---


	<a href="https://iclr.cc/Conferences/2024" target="_blank">
	<img alt="ICLR 2024" src="https://img.shields.io/badge/Proceedings-ICLR2024-red" />
	</a>

	Offical checkpoint for [Tool-Augmented Reward Modeling (ICLR 2024 spotlight)](https://openreview.net/pdf?id=d94x0gWTUX).



	# Model Description

	Themis is a tool-augmented preference model to address these limitations by empowering RMs with access to external environments, including calculators and search engines.
	It was introduced in the [ICLR 2024 paper](https://arxiv.org/pdf/2310.01045.pdf) and first released in this [repository](https://github.com/ernie-research/Tool-Augmented-Reward-Model).
	Themis-7b is trained with [TARA](https://huggingface.co/datasets/baidu/TARA), achieving a noteworthy overall improvement of 17.7% across eight tasks in preference ranking.

	## 🔥 News
	* 9 February, 2024: 🎉 We release the official codebase and model weights of [`baidu/Themis-7b`](https://huggingface.co/baidu/Themis-7b). Stay tuned!🔥
	* 16 January, 2024: 🎉 Our work has been accepted to [ICLR 2024](https://iclr.cc/Conferences/2024) spotlight! ✨


	# Citation
	```text
	@inproceedings{tarm-2024-ernie,
	author = {Lei Li and
	Yekun Chai and
	Shuohuan Wang and
	Yu Sun and
	Hao Tian and
	Ningyu Zhang and
	Hua Wu},
	title = {Tool-Augmented Reward Modeling},
	booktitle = {The Twelfth International Conference on Learning Representations (ICLR)},
	year = {2024},
	url = {https://openreview.net/forum?id=d94x0gWTUX},
	}
	```