|
--- |
|
license: apache-2.0 |
|
pipeline_tag: text-generation |
|
--- |
|
|
|
# Rationalyst |
|
|
|
This model is a fine-tuned version of the [LLaMa-3-Instruct-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct). It was |
|
introduced in [RATIONALYST: Pre-training Process-Supervision for Improving Reasoning](https://arxiv.org/pdf/2410.01044). The code for the rationale extraction, model training, and |
|
inference can be found [here](https://github.com/JHU-CLSP/reasoning_world_model). |
|
|
|
## Model description |
|
Implicit rationales are often embedded in the unlabelled text, reflecting the natural thought processes behind speech and writing. |
|
RATIONALYST is a self-supervised approach to extract and filter these implicit rationales from unlabelled text and apply |
|
them to supervise reasoning. |
|
|
|
## How to use |
|
To use it, simply input question and partial reasoning trajectory, and the model will output the rationale to supervise the next reasoning step. |
|
|
|
## Training data |
|
|
|
This Rationalyst is trained using 65k implicit rationales from The Pile and 14k implicit rationales from GSM8K and ECQA. The data used can be found [here](https://huggingface.co/datasets/Dongwei/reasoning_world_model) |
|
|
|
|
|
## Evaluation results |
|
|
|
When used to evaluate on downstream tasks, this model achieves the following results: |
|
|
|
| Task | GSM8K | MATH | ECQA | HellaSwag | ProofWriter | ARC | MMLU-Pro | |
|
|:----:|:----:|:----:|:----:|:-----:|:----:|:-----:|:----:| |
|
| | 81.6 | 32.5 | 75.2 | 60.3 | 90.7 | 80.7 | 45.3 | |
|
|