sometimesanotion
/

Lamarck-14B-v0.3

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

sometimesanotion commited on Dec 8, 2024

Commit

a37eda0

·

verified ·

1 Parent(s): 7c262c0

Update README.md

Files changed (1) hide show

README.md +5 -3

README.md CHANGED Viewed

@@ -23,9 +23,9 @@ pipeline_tag: text-generation
 ### Overview:
-Lamarck-14B is a carefully designed merge which emphasizes [arcee-ai/Virtuoso-Small](https://huggingface.co/arcee-ai/Virtuoso-Small) in early and finishing layers, and midway features strong influence on reasoning and prose from [CultriX/SeQwence-14B-EvolMerge](http://huggingface.co/CultriX/SeQwence-14B-EvolMerge) especially, but a hefty list of other models as well.
-Its reasoning and prose skills are quite strong.  Version 0.3 is the product of a carefully planned and tested sequence of templated merges, produced by a toolchain which wraps around Arcee's mergekit.
 For GGUFs, [mradermacher/Lamarck-14B-v0.3-i1-GGUF](https://huggingface.co/mradermacher/Lamarck-14B-v0.3-i1-GGUF) has you covered.  Thank you @mradermacher!
@@ -38,10 +38,12 @@ For GGUFs, [mradermacher/Lamarck-14B-v0.3-i1-GGUF](https://huggingface.co/mrader
 ![graph.png](https://huggingface.co/sometimesanotion/Lamarck-14B-v0.3-experimental/resolve/main/graph.png)
 ### Thanks go to:
 - @arcee-ai's team for the ever-capable mergekit, and the exceptional Virtuoso Small model.
-- @CultriX for the helpful examples of memory-efficient sliced merges and evolutionary merging.  Their contribution of tinyevals on version 0.1 of Lamarck did much to validate the hypotheses of the process used here.
 - The authors behind the capable models that appear in the model_stock.  The boost to prose quality is already noticeable.
 ### Models Merged:

 ### Overview:
+Lamarck-14B is a carefully designed merge which emphasizes [arcee-ai/Virtuoso-Small](https://huggingface.co/arcee-ai/Virtuoso-Small) in early and finishing layers, and midway features strong influence on reasoning and prose from [CultriX/SeQwence-14B-EvolMerge](http://huggingface.co/CultriX/SeQwence-14B-EvolMerge) especially, but a number of other models as well through its model_stock.
+Version 0.3 is the product of a carefully planned and tested sequence of templated merges, produced by a toolchain which wraps around Arcee's mergekit.
 For GGUFs, [mradermacher/Lamarck-14B-v0.3-i1-GGUF](https://huggingface.co/mradermacher/Lamarck-14B-v0.3-i1-GGUF) has you covered.  Thank you @mradermacher!
 ![graph.png](https://huggingface.co/sometimesanotion/Lamarck-14B-v0.3-experimental/resolve/main/graph.png)
+The first two layers come entirely from Virtuoso.  The choice to leave these layers untouched comes from [arxiv.org/abs/2307.03172](https://arxiv.org/abs/2307.03172) which identify attention glitches as a chief cause of hallucinations.  Layers 3-8 feature a SLERP gradient into introducing the DELLA merge tree.
 ### Thanks go to:
 - @arcee-ai's team for the ever-capable mergekit, and the exceptional Virtuoso Small model.
+- @CultriX for the helpful examples of memory-efficient sliced merges and evolutionary merging.  Their contribution of tinyevals on version 0.1 of Lamarck did much to validate the hypotheses of the DELLA->SLERP gradient process used here.
 - The authors behind the capable models that appear in the model_stock.  The boost to prose quality is already noticeable.
 ### Models Merged: