---
license: apache-2.0
language:
- en
datasets:
- sl-alex/openai-prm800k-solutions-only
---

Finetunes Llama-7b+Alpaca to solve problems via stepwise reasoning (OpenAI [PRM800k dataset](https://github.com/openai/prm800k), or rather our postprocessed version, [`sl-alex/openai-prm800k-solutions-only`](https://huggingface.co/datasets/sl-alex/openai-prm800k-solutions-only)).

## Model description

This is a fork of [`llama-7b`](https://huggingface.co/huggyllama/llama-7b) + [`tloen/alpaca-lora-7b`](https://huggingface.co/tloen/alpaca-lora-7b).

That is: we loaded Llama-7b, we applied Alpaca LoRA, expanded vocabulary, then QLoRA 4-bit finetuned from there.

Parts:

- base model [`llama-7b`](https://huggingface.co/huggyllama/llama-7b)
- LoRA 0 [`tloen/alpaca-lora-7b`](https://huggingface.co/tloen/alpaca-lora-7b)
- LoRA 1
  - `adapter_config.json`
  - `adapter_model.bin`
- tokenizer (this is just [`llama-7b`](https://huggingface.co/huggyllama/llama-7b)'s tokenizer, plus a `[PAD]` token for which no embedding was learned. in practice you can inference a batch-of-1 using the original [`llama-7b`](https://huggingface.co/huggyllama/llama-7b) tokenizer instead)
  - `added_tokens.json`
  - `special_tokens_map.json`
  - `tokenizer.model`
  - `tokenizer_config.json`

## Training

Trained using [`qlora.py`](https://github.com/scottlogic-alex/qlora/blob/stepwise/qlora.py) from our [`stepwise`](https://github.com/scottlogic-alex/qlora/tree/stepwise) branch of [qlora](https://github.com/artidoro/qlora).  
Known-good as of commit [`3a86919`](https://github.com/scottlogic-alex/qlora/blob/3a8691986b6718562bcd8e3522447b52842c1d9a/qlora.py).

`python -m qlora --model_name_or_path huggyllama/llama-7b --lora_name_or_path tloen/alpaca-lora-7b --dataset prm800k-solutions --dataset_format prm800k-solutions --bf16 --max_memory_MB 24000 --use_bos_token_in_prompt --truncate_toward_center --source_max_len 184 --target_max_len 998 --gradient_accumulation_steps 4 --per_device_train_batch_size 4 --per_device_eval_batch_size 4 --learning_rate 0.0002 --run_name 13b_alpaca_special_tokens_long --report_to wandb --save_steps 64 --save_total_limit 3 --max_steps 1664 --evaluation_strategy steps --eval_steps 64 --generate_steps 16`

## Usage

You can load using [`evaluate.py`](https://github.com/scottlogic-alex/qlora/blob/stepwise/evaluate.py#L209-L278) from our [`stepwise`](https://github.com/scottlogic-alex/qlora/tree/stepwise) branch of [qlora](https://github.com/artidoro/qlora).  
Known-good as of commit [`3a86919`](https://github.com/scottlogic-alex/qlora/blob/3a8691986b6718562bcd8e3522447b52842c1d9a/evaluate.py).

```bash
git clone -b stepwise https://github.com/scottlogic-alex/qlora.git
cd qlora
python -m evaluate \
--model_name_or_path huggyllama/llama-7b \
--base_lora_model_name_or_path tloen/alpaca-lora-7b \
--tokenizer_model_name_or_path huggyllama/llama-7b \
--lora_model_name_or_path sl-alex/llama-7b-alpaca-stepwise-lora \
--use_bos_token_in_prompt \
--overrun_countermeasures False
```

## Prompt style

The model is finetuned for instruction-response.

`prompt = '<s>Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\nHow do I feed a family of 5 without ingredients?\n\n### Response:\n<|step_start|>'`

## Example output:

`$ How do I drink a matcha latte?`  
`I’ve never drunk a matcha latte before, but I’ve seen people do it online. It looks like they dump the entire contents of a little teabag into a frothed milk drink and then mix it well. I think I could do that.<|step_end|><|step_start|>I’ll have to buy a matcha latte kit first, because I don’t know what they have in the teabags. I’ll see if my favorite coffee shop sells them.<|step_end|><|step_start|>I’ll also need a whisk or a mixer. I’ll look online for the best one. Which one do you recommend?<|step_end|><|step_start|>I’ll need to heat up the milk first. Which will you use -- the microwave or the stovetop?<|step_end|><|step_start|>I’ll warm up 16 ounces of milk in the microwave. For the stovetop, I’ll just put it on the burner for two minutes. I don’t want it to boil.`

`$ What is the third Fibonacci number?`  
`I need to find the third Fibonacci number, which is the number after the second, which is F(2).<|step_end|><|step_start|>There are two ways to get the third Fibonacci number. I could either wait one more day and get F(3) using the Fibonacci sequence, or I could do something more clever and faster by using a computer. <|step_end|><|step_start|>Computers are great for mathematics, so I'll see if I can get F(3) quicker with a computer.<|step_end|><|step_start|>First, though, I'll explain what Fibonacci means. It's named after a 13th-century Italian accountant named Leonardo Fibonacci who created a sequence of numbers where each following number is the sum of the two previous ones. <|step_end|><|step_start|>The Fibonacci sequence begins 0, 1, 1, 2, 3, 5, 8, 13, 21, and so on. <|step_end|><|`

`$ How do I make a cup of tea?`  
`First, I find a teaspoonful of loose-leaf Assam tea in a teapot. Then, I fill the teapot with water that has been warming up over a stovetop burner.<|step_end|><|step_start|>Once the water has reached a rolling boil, I pour it over the tea, cover the pot, and steam it for at least two minutes. Uncover the pot, remove the lid, and serve. That's how to make a cup of tea.`  
  
`<|response_start|>I've always thought Assam tea was an Indian tea, but this web site says it comes from the north-center part of Bangladesh. I wonder if there's a difference between Assam and Bangladesh tea.<|Response_End|><|step_start|>BANGLADESH tea has all the qualities of Assam, but the difference in the final cup lies in the aroma and the lingering aftertaste.`  
  
`<|step_end|><|response_start|>The Assam tea comes in very large, flat pieces of processed leaves. They are cut from the tree and rolled`

`$ How do I open a jar?`  
`I need to find a jar opener at my school. I can't just use my mouth because it's too dangerous to choke on the lid, and it would ruin my school uniform. I could bring a bottle opener from home, but it might not work. I also can't use a knife, because that could get me into trouble with the headmaster. The solution is to ask a friendly student to open the jar for me.</|step_end|><|step_start|>I find a student who is willing to help. We both hold the jar with our hands on the inside and outside of the rim, and she uses her nails to pry the lid off. She has been doing this for years, and it looks easy, but I should be careful not to scratch myself.</|step_end|><|step_start|>We set the jar down and all the pieces are there: the lid, the jar, and the plastic bag. I can never open another jar without feeling proud of myself. My classmates are impressed by my skills as a consumer.</|step_end|><|step_start|>I should keep the lid as a souvenir, but`