sometimesanotion
/

Lamarck-14B-v0.7-Fusion

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Lamarck-14B-v0.7-Fusion / README.md

sometimesanotion's picture

sometimesanotion

Update README.md

45016f0 verified about 22 hours ago

|

history blame contribute delete

3.88 kB

	---
	base_model:
	- sometimesanotion/Lamarck-14B-v0.7
	- sometimesanotion/Qwenvergence-14B-v12-Prose-DS
	- jpacifico/Chocolatine-2-14B-Instruct-v2.0.3
	- suayptalha/Lamarckvergence-14B
	library_name: transformers
	tags:
	- mergekit
	- merge
	license: apache-2.0
	language:
	- en
	---
	# EXPERIMENTAL:

	So what's this new arcee_fusion merge method, and what can we do with it? This model aims to find out, as a multi-stage merge where 3 out of 4 steps are fusions:
	![image/png](https://cdn-uploads.huggingface.co/production/uploads/665fef5a4794222f6a2fe605/5J98k-IqLMb_obGPaP1CM.png)

	* A fusion of [Lamarck-14B-v0.7](http://huggingface.co/sometimesanotion/Lamarck-14B-v0.7) and @suayptalha's [Lamarckvergence SLERP merge](http://huggingface.co/suayptalha/Lamarckvergence-14B) of Lamarck-14B-v0.7 and [Qwenvergence-14B-v12-Prose-DS](http://huggingface.co/sometimesanotion/Qwenvergence-14B-v12-Prose-DS).
	* A SLERP of Lamarck-14B-v0.7-Fusionvergence with Qwenvergence-14B-v12-Prose-DS, the latter emphasized in later layers.
	* A fusion of @jpacifico's [Chocolatine-2-14B-Instruct-v2.0.3](http://huggingface.co/jpacifico/Chocolatine-2-14B-Instruct-v2.0.3), itself a finetune of a merge of Lamarck-14B-v0.7, Arcee's (https://huggingface.co/arcee-ai/Virtuoso-Small-v2), and Qwenvergence-14B-v12-Prose-DS, fusion-merged with - you guessed it - Qwenvergence-14B-v12-Prose-DS
	* A fusion of the previous two.

	I've seen strong prose from this model, which is natural considering its re-emphasis of Qwenvergence-14B-v12-Prose-DS. A full evaluation will be cued shortly.

	This merge strategy is much simpler than a mainline Lamarck release, but that is necessary to see how multiple fusion merges behave. Where it fits for efforts towards a Lamarck v0.8 depends greatly on evaluation and feedback.

	### Configuration

	The following YAML configuration was used to produce this model:

	```yaml
	name: Lamarck-14B-v0.7-Fusionvergence
	merge_method: arcee_fusion
	base_model: sometimesanotion/Lamarck-14B-v0.7
	tokenizer_source: base
	parameters:
	int8_mask: true
	normalize: true
	rescale: false
	dtype: bfloat16
	out_dtype: bfloat16
	models:
	- model: suayptalha/Lamarckvergence-14B
	---
	name: Slerp-Lamarckvevergence
	base_model: sometimesanotion/Lamarck-14B-v0.7-Fusion-Lamarckvergence
	merge_method: slerp
	tokenizer_source: base
	dtype: float32
	out_dtype: bfloat16
	parameters:
	t:
	- filter: self_attn
	value: [ 0.00, 0.50, 0.30, 0.70, 1.00 ]
	- filter: mlp
	value: [ 1.00, 0.50, 0.70, 0.30, 0.00 ]
	- value: [ 0.00, 0.00, 0.00, 0.00, 0.04, 0.08, 0.12, 0.16, 0.24, 0.32, 0.40, 0.48, 0.56, 0.64, 0.72, 0.72, 0.72, 0.72, 0.72, 0.72, 0.72, 0.72, 0.64, 0.56, 0.48 ]
	slices:
	- sources:
	- model: sometimesanotion/Lamarck-14B-v0.7-Fusion-Lamarckvergence
	layer_range: [ 0, 48 ]
	- model: sometimesanotion/Qwenvergence-14B-v12-Prose-DS
	layer_range: [ 0, 48 ]
	---
	name: Chocolatine-Fusion-Qwenvergence
	merge_method: arcee_fusion
	base_model: jpacifico/Chocolatine-2-14B-Instruct-v2.0.3
	tokenizer_source: base
	parameters:
	int8_mask: true
	normalize: true
	rescale: false
	dtype: bfloat16
	out_dtype: bfloat16
	models:
	- model: sometimesanotion/Qwenvergence-14B-v12-Prose-DS
	---
	name: Lamarck-14B-v0.7-Fusion
	merge_method: arcee_fusion
	base_model: sometimesanotion/Slerp-Lamarckvevergence
	tokenizer_source: base
	parameters:
	int8_mask: true
	normalize: true
	rescale: false
	dtype: bfloat16
	out_dtype: bfloat16
	models:
	- model: sometimesanotion/Chocolatine-Fusion-Qwenvergence
	```