Aratako
/

Qwen1.5-MoE-2x72B

Text Generation

Mixture of Experts

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Qwen1.5-MoE-2x72B / README.md

Aratako's picture

Update README.md

2dc3d95 verified 11 months ago

|

3.13 kB

	---
	base_model:
	- Qwen/Qwen1.5-72B-Chat
	- abacusai/Liberated-Qwen1.5-72B
	license: other
	license_name: tongyi-qianwen
	license_link: https://huggingface.co/Qwen/Qwen1.5-72B-Chat/blob/main/LICENSE
	language:
	- en
	tags:
	- mergekit
	- merge
	- moe
	---
	# Qwen1.5-MoE-2x72B

	## Description
	This model is created using MoE (Mixture of Experts) through mergekit based on [Qwen/Qwen1.5-72B-Chat](https://huggingface.co/Qwen/Qwen1.5-72B-Chat) and [abacusai/Liberated-Qwen1.5-72B](https://huggingface.co/abacusai/Liberated-Qwen1.5-72B) without further FT.

	It utilizes a customized script for MoE via mergekit, which is available [here](https://github.com/Aratako/mergekit-qwen2).

	Due to the structural modifications introduced by MoE, the use of this model requires [custom modeling file](https://huggingface.co/Aratako/Liberated-Qwen1.5-2x72B/blob/main/modeling_qwen2.py) and [custom configuration file](https://huggingface.co/Aratako/Liberated-Qwen1.5-2x72B/blob/main/configuration_qwen2.py).
	When using the model, please place these files in the same folder as the model.

	This model inherits the the [tongyi-qianwen license](https://huggingface.co/Qwen/Qwen1.5-72B-Chat/blob/main/LICENSE).

	## Benchmark
	The benchmark score of the [mt-bench](https://github.com/lm-sys/FastChat/tree/main/fastchat/llm_judge) for this model and the two base models are as follows:

	1-turn, 4-bit quantization
	\|Model\|Size\|Coding\|Extraction\|Humanities\|Math\|Reasoning\|Roleplay\|STEM\|Writing\|avg_score\|
	\|---\|---\|---\|---\|---\|---\|---\|---\|---\|---\|---\|
	\| Liberated-Qwen1.5-72B \| 72B \| 5.8 \| 7.9 \| 9.6 \| 6.7 \| 7.0 \| 9.05 \| 9.55 \| 9.9 \| 8.1875 \|
	\| Qwen1.5-72B-Chat \| 72B \| 5.5 \| 8.7 \| 9.7 \| 8.4 \| 7.5 \| 9.0 \| 9.45 \| 9.75 \| 8.5000 \|
	\| This model \| 2x72B \| 5.6 \| 7.8 \| 9.75 \| 7.0 \| 8.1 \| 9.0 \| 9.65 \| 9.8 \| 8.3375 \|

	![mt-bench-1turn](./mt-bench-1turn.png)

	2-turn, 4-bit quantization
	\|Model\|Size\|Coding\|Extraction\|Humanities\|Math\|Reasoning\|Roleplay\|STEM\|Writing\|avg_score\|
	\|---\|---\|---\|---\|---\|---\|---\|---\|---\|---\|---\|
	\| Liberated-Qwen1.5-72B \| 72B \| 3.9 \| 8.2 \| 10.0 \| 5.7 \| 5.5 \| 8.4 \| 8.7 \| 8.6 \| 7.3750 \|
	\| Qwen1.5-72B-Chat \| 72B \| 5.2 \| 8.8 \| 10.0 \| 6.1 \| 6.7 \| 9.0 \| 9.8 \| 9.5 \| 8.1375 \|
	\| This model \| 2x72B \| 5.0 \| 9.5 \| 9.9 \| 5.6 \| 8.1 \| 9.3 \| 9.6 \| 9.2 \| 8.2750 \|

	![mt-bench-2turn](./mt-bench-2turn.png)

	## Merge config
	[mergekit_config.yml](./mergekit_moe_config.yml)
	```yaml
	base_model: ./Qwen1.5-72B-Chat
	gate_mode: random
	dtype: bfloat16
	experts:
	- source_model: ./Qwen1.5-72B-Chat
	positive_prompts: []
	- source_model: ./Liberated-Qwen1.5-72B
	positive_prompts: []
	tokenizer_source: model:./Qwen1.5-72B-Chat
	```

	## Gratitude
	- Huge thanks to [Alibaba Cloud Qwen](https://www.alibabacloud.com/solutions/generative-ai/qwen) for training and publishing the weights of Qwen model
	- Thank you to [abacusai](https://huggingface.co/abacusai) for publishing fine-tuned model from Qwen
	- And huge thanks to [mlabonne](https://huggingface.co/mlabonne), as I customized modeling file using [phixtral](https://huggingface.co/mlabonne/phixtral-4x2_8) as a reference