BigWeave-v6-90b / README.md

Adding Evaluation Results (#2)

f76471d verified 8 months ago

4.5 kB

	---
	language:
	- en
	license: llama2
	tags:
	- Xwin
	- Euryale 1.3
	- frankenmerge
	- 90b
	pipeline_tag: conversational
	model-index:
	- name: BigWeave-v6-90b
	results:
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: AI2 Reasoning Challenge (25-Shot)
	type: ai2_arc
	config: ARC-Challenge
	split: test
	args:
	num_few_shot: 25
	metrics:
	- type: acc_norm
	value: 65.36
	name: normalized accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=llmixer/BigWeave-v6-90b
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: HellaSwag (10-Shot)
	type: hellaswag
	split: validation
	args:
	num_few_shot: 10
	metrics:
	- type: acc_norm
	value: 87.21
	name: normalized accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=llmixer/BigWeave-v6-90b
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: MMLU (5-Shot)
	type: cais/mmlu
	config: all
	split: test
	args:
	num_few_shot: 5
	metrics:
	- type: acc
	value: 68.04
	name: accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=llmixer/BigWeave-v6-90b
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: TruthfulQA (0-shot)
	type: truthful_qa
	config: multiple_choice
	split: validation
	args:
	num_few_shot: 0
	metrics:
	- type: mc2
	value: 57.96
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=llmixer/BigWeave-v6-90b
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: Winogrande (5-shot)
	type: winogrande
	config: winogrande_xl
	split: validation
	args:
	num_few_shot: 5
	metrics:
	- type: acc
	value: 81.69
	name: accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=llmixer/BigWeave-v6-90b
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: GSM8k (5-shot)
	type: gsm8k
	config: main
	split: test
	args:
	num_few_shot: 5
	metrics:
	- type: acc
	value: 44.58
	name: accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=llmixer/BigWeave-v6-90b
	name: Open LLM Leaderboard
	---
	# BigWeave v6 90B

	<img src="https://cdn-uploads.huggingface.co/production/uploads/65a6db055c58475cf9e6def1/4CbbAN-X7ZWj702JrcCGH.png" width=600>

	A Goliath-120b style frankenmerge of Xwin-LM-70b-v0.1 and Euryale-1.3-70b. The goal is to find other merge combinations that work well.

	The version number is for me to keep track of the merges, only results that seem to work reasonably well are kept/published.

	# Prompting Format
	Vicuna and Alpaca.

	# Merge process
	The models used in the merge are [Xwin-LM-70b-v0.1](https://huggingface.co/Xwin-LM/Xwin-LM-70B-V0.1) and [Euryale-1.3-70b](https://huggingface.co/Sao10K/Euryale-1.3-L2-70B).

	The layer mix:
	```yaml
	- range 0, 12
	Xwin
	- range 9, 14
	Euryale
	- range 12, 62
	Xwin
	- range 54, 71
	Euryale
	- range 62, 80
	Xwin
	```

	# Acknowledgements
	[@Xwin-LM](https://huggingface.co/Xwin-LM) For creating Xwin

	[@Sao10K](https://huggingface.co/Sao10K) For creating Euryale

	[@alpindale](https://huggingface.co/alpindale) For creating the original Goliath

	[@chargoddard](https://huggingface.co/chargoddard) For developing [mergekit](https://github.com/cg123/mergekit).

	# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
	Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_llmixer__BigWeave-v6-90b)

	\| Metric \|Value\|
	\|---------------------------------\|----:\|
	\|Avg. \|67.47\|
	\|AI2 Reasoning Challenge (25-Shot)\|65.36\|
	\|HellaSwag (10-Shot) \|87.21\|
	\|MMLU (5-Shot) \|68.04\|
	\|TruthfulQA (0-shot) \|57.96\|
	\|Winogrande (5-shot) \|81.69\|
	\|GSM8k (5-shot) \|44.58\|