pythia-410m-helpful-sft / README.md

lomahony

Update README.md

27699f1 verified 4 months ago

preview code

raw

history blame contribute delete

No virus

4.63 kB

	---
	language:
	- en
	tags:
	- pytorch
	- causal-lm
	- pythia
	license: apache-2.0
	datasets:
	- Anthropic/hh-rlhf
	---

	[Pythia-410m](https://huggingface.co/EleutherAI/pythia-410m) supervised finetuned using TRLx library with the helpful subset of [Anthropic-hh-rlhf dataset](https://huggingface.co/datasets/Anthropic/hh-rlhf) for 1 epoch.

	Checkpoints are also uploaded.

	Fully reproducible finetuning code is available on [GitHub](https://github.com/lauraaisling/trlx-pythia/tree/main)

	[wandb log](https://wandb.ai/lauraomahony999/pythia-sft/runs/quq2097z)

	See [Pythia-410m](https://huggingface.co/EleutherAI/pythia-410m) for model details [(paper)](https://arxiv.org/abs/2101.00027).

	See further details of these models in the paper [Attributing Mode Collapse in the Fine-Tuning of Large Language Models](https://openreview.net/pdf?id=3pDMYjpOxk).

	You can cite these models if they are helpful as follows:

	<pre>
	@inproceedings{o2024attributing,
	title={Attributing Mode Collapse in the Fine-Tuning of Large Language Models},
	author={O’Mahony, Laura and Grinsztajn, Leo and Schoelkopf, Hailey and Biderman, Stella},
	booktitle={ICLR 2024, Mathematical and Empirical Understanding of Foundation Models (ME-FoMo) workshop},
	year={2024}
	}
	</pre>

	hf (pretrained=lomahony/pythia-410m-helpful-sft), gen_kwargs: (None), limit: None, num_fewshot: 0, batch_size: 16
	\| Tasks \|Version\|Filter\|n-shot\| Metric \| Value \| \|Stderr\|
	\|--------------\|------:\|------\|-----:\|---------------\|------:\|---\|------\|
	\|arc_challenge \| 1\|none \| 0\|acc \| 0.2355\|± \|0.0124\|
	\| \| \|none \| 0\|acc_norm \| 0.2594\|± \|0.0128\|
	\|arc_easy \| 1\|none \| 0\|acc \| 0.5051\|± \|0.0103\|
	\| \| \|none \| 0\|acc_norm \| 0.4478\|± \|0.0102\|
	\|boolq \| 2\|none \| 0\|acc \| 0.6113\|± \|0.0085\|
	\|hellaswag \| 1\|none \| 0\|acc \| 0.3372\|± \|0.0047\|
	\| \| \|none \| 0\|acc_norm \| 0.4001\|± \|0.0049\|
	\|lambada_openai\| 1\|none \| 0\|perplexity \|21.8172\|± \|0.7736\|
	\| \| \|none \| 0\|acc \| 0.3755\|± \|0.0067\|
	\|openbookqa \| 1\|none \| 0\|acc \| 0.1940\|± \|0.0177\|
	\| \| \|none \| 0\|acc_norm \| 0.2960\|± \|0.0204\|
	\|piqa \| 1\|none \| 0\|acc \| 0.6719\|± \|0.0110\|
	\| \| \|none \| 0\|acc_norm \| 0.6687\|± \|0.0110\|
	\|sciq \| 1\|none \| 0\|acc \| 0.7700\|± \|0.0133\|
	\| \| \|none \| 0\|acc_norm \| 0.6540\|± \|0.0151\|
	\|wikitext \| 2\|none \| 0\|word_perplexity\|23.8136\|± \|N/A \|
	\| \| \|none \| 0\|byte_perplexity\| 1.8091\|± \|N/A \|
	\| \| \|none \| 0\|bits_per_byte \| 0.8553\|± \|N/A \|
	\|winogrande \| 1\|none \| 0\|acc \| 0.5320\|± \|0.0140\|

	hf (pretrained=lomahony/pythia-410m-helpful-sft), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: 16
	\| Tasks \|Version\|Filter\|n-shot\| Metric \| Value \| \|Stderr\|
	\|--------------\|------:\|------\|-----:\|---------------\|------:\|---\|------\|
	\|arc_challenge \| 1\|none \| 5\|acc \| 0.2355\|± \|0.0124\|
	\| \| \|none \| 5\|acc_norm \| 0.2790\|± \|0.0131\|
	\|arc_easy \| 1\|none \| 5\|acc \| 0.5274\|± \|0.0102\|
	\| \| \|none \| 5\|acc_norm \| 0.5072\|± \|0.0103\|
	\|boolq \| 2\|none \| 5\|acc \| 0.5226\|± \|0.0087\|
	\|hellaswag \| 1\|none \| 5\|acc \| 0.3367\|± \|0.0047\|
	\| \| \|none \| 5\|acc_norm \| 0.3991\|± \|0.0049\|
	\|lambada_openai\| 1\|none \| 5\|perplexity \|37.4791\|± \|1.3737\|
	\| \| \|none \| 5\|acc \| 0.3049\|± \|0.0064\|
	\|openbookqa \| 1\|none \| 5\|acc \| 0.1620\|± \|0.0165\|
	\| \| \|none \| 5\|acc_norm \| 0.2900\|± \|0.0203\|
	\|piqa \| 1\|none \| 5\|acc \| 0.6708\|± \|0.0110\|
	\| \| \|none \| 5\|acc_norm \| 0.6676\|± \|0.0110\|
	\|sciq \| 1\|none \| 5\|acc \| 0.8630\|± \|0.0109\|
	\| \| \|none \| 5\|acc_norm \| 0.8430\|± \|0.0115\|
	\|wikitext \| 2\|none \| 5\|word_perplexity\|23.8136\|± \|N/A \|
	\| \| \|none \| 5\|byte_perplexity\| 1.8091\|± \|N/A \|
	\| \| \|none \| 5\|bits_per_byte \| 0.8553\|± \|N/A \|
	\|winogrande \| 1\|none \| 5\|acc \| 0.5272\|± \|0.0140\|

	---
	language:
	- en
	tags:
	- pytorch
	- causal-lm
	- pythia
	license: apache-2.0
	datasets:
	- Anthropic/hh-rlhf
	---

	[Pythia-410m](https://huggingface.co/EleutherAI/pythia-410m) supervised finetuned using TRLx library with the helpful subset of [Anthropic-hh-rlhf dataset](https://huggingface.co/datasets/Anthropic/hh-rlhf) for 1 epoch.

	Checkpoints are also uploaded.

	Fully reproducible finetuning code is available on [GitHub](https://github.com/lauraaisling/trlx-pythia/tree/main)

	[wandb log](https://wandb.ai/lauraomahony999/pythia-sft/runs/quq2097z)

	See [Pythia-410m](https://huggingface.co/EleutherAI/pythia-410m) for model details [(paper)](https://arxiv.org/abs/2101.00027).

	See further details of these models in the paper [Attributing Mode Collapse in the Fine-Tuning of Large Language Models](https://openreview.net/pdf?id=3pDMYjpOxk).

	You can cite these models if they are helpful as follows:

	<pre>
	@inproceedings{o2024attributing,
	title={Attributing Mode Collapse in the Fine-Tuning of Large Language Models},
	author={O’Mahony, Laura and Grinsztajn, Leo and Schoelkopf, Hailey and Biderman, Stella},
	booktitle={ICLR 2024, Mathematical and Empirical Understanding of Foundation Models (ME-FoMo) workshop},
	year={2024}
	}
	</pre>

	hf (pretrained=lomahony/pythia-410m-helpful-sft), gen_kwargs: (None), limit: None, num_fewshot: 0, batch_size: 16
	\| Tasks \|Version\|Filter\|n-shot\| Metric \| Value \| \|Stderr\|
	\|--------------\|------:\|------\|-----:\|---------------\|------:\|---\|------\|
	\|arc_challenge \| 1\|none \| 0\|acc \| 0.2355\|± \|0.0124\|
	\| \| \|none \| 0\|acc_norm \| 0.2594\|± \|0.0128\|
	\|arc_easy \| 1\|none \| 0\|acc \| 0.5051\|± \|0.0103\|
	\| \| \|none \| 0\|acc_norm \| 0.4478\|± \|0.0102\|
	\|boolq \| 2\|none \| 0\|acc \| 0.6113\|± \|0.0085\|
	\|hellaswag \| 1\|none \| 0\|acc \| 0.3372\|± \|0.0047\|
	\| \| \|none \| 0\|acc_norm \| 0.4001\|± \|0.0049\|
	\|lambada_openai\| 1\|none \| 0\|perplexity \|21.8172\|± \|0.7736\|
	\| \| \|none \| 0\|acc \| 0.3755\|± \|0.0067\|
	\|openbookqa \| 1\|none \| 0\|acc \| 0.1940\|± \|0.0177\|
	\| \| \|none \| 0\|acc_norm \| 0.2960\|± \|0.0204\|
	\|piqa \| 1\|none \| 0\|acc \| 0.6719\|± \|0.0110\|
	\| \| \|none \| 0\|acc_norm \| 0.6687\|± \|0.0110\|
	\|sciq \| 1\|none \| 0\|acc \| 0.7700\|± \|0.0133\|
	\| \| \|none \| 0\|acc_norm \| 0.6540\|± \|0.0151\|
	\|wikitext \| 2\|none \| 0\|word_perplexity\|23.8136\|± \|N/A \|
	\| \| \|none \| 0\|byte_perplexity\| 1.8091\|± \|N/A \|
	\| \| \|none \| 0\|bits_per_byte \| 0.8553\|± \|N/A \|
	\|winogrande \| 1\|none \| 0\|acc \| 0.5320\|± \|0.0140\|

	hf (pretrained=lomahony/pythia-410m-helpful-sft), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: 16
	\| Tasks \|Version\|Filter\|n-shot\| Metric \| Value \| \|Stderr\|
	\|--------------\|------:\|------\|-----:\|---------------\|------:\|---\|------\|
	\|arc_challenge \| 1\|none \| 5\|acc \| 0.2355\|± \|0.0124\|
	\| \| \|none \| 5\|acc_norm \| 0.2790\|± \|0.0131\|
	\|arc_easy \| 1\|none \| 5\|acc \| 0.5274\|± \|0.0102\|
	\| \| \|none \| 5\|acc_norm \| 0.5072\|± \|0.0103\|
	\|boolq \| 2\|none \| 5\|acc \| 0.5226\|± \|0.0087\|
	\|hellaswag \| 1\|none \| 5\|acc \| 0.3367\|± \|0.0047\|
	\| \| \|none \| 5\|acc_norm \| 0.3991\|± \|0.0049\|
	\|lambada_openai\| 1\|none \| 5\|perplexity \|37.4791\|± \|1.3737\|
	\| \| \|none \| 5\|acc \| 0.3049\|± \|0.0064\|
	\|openbookqa \| 1\|none \| 5\|acc \| 0.1620\|± \|0.0165\|
	\| \| \|none \| 5\|acc_norm \| 0.2900\|± \|0.0203\|
	\|piqa \| 1\|none \| 5\|acc \| 0.6708\|± \|0.0110\|
	\| \| \|none \| 5\|acc_norm \| 0.6676\|± \|0.0110\|
	\|sciq \| 1\|none \| 5\|acc \| 0.8630\|± \|0.0109\|
	\| \| \|none \| 5\|acc_norm \| 0.8430\|± \|0.0115\|
	\|wikitext \| 2\|none \| 5\|word_perplexity\|23.8136\|± \|N/A \|
	\| \| \|none \| 5\|byte_perplexity\| 1.8091\|± \|N/A \|
	\| \| \|none \| 5\|bits_per_byte \| 0.8553\|± \|N/A \|
	\|winogrande \| 1\|none \| 5\|acc \| 0.5272\|± \|0.0140\|