--- base_model: - meta-llama/Meta-Llama-3-8B - meta-llama/Meta-Llama-3-8B-Instruct - rinna/llama-3-youko-8b - rinna/llama-3-youko-8b-instruct - tokyotech-llm/Llama-3-Swallow-8B-v0.1 - tokyotech-llm/Llama-3-Swallow-8B-Instruct-v0.1 - shisa-ai/shisa-v1-llama3-8b - lmg-anon/vntl-llama3-8b-v2-qlora library_name: transformers tags: - mergekit - merge - translation - japanese_media - otaku_media - visual_novels - VNs language: - en - ja --- # Llama-3-VNTL-Yollisa-8B This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit). This merge is an expansion on the idea of [merging at extremely low weight as an alternitive to finetuning](https://huggingface.co/grimjim/kukulemon-v3-soul_mix-32k-7B) with the added step of subtracting the base model from finetunes before merging. Instruct format is the custom version of llama3 that VNTL uses, but you should be able to mix in some regular llama3 formats as well, and it might even help with improving translation quality with the right prompt. ## Usage ### Samplers No reccomended samplers yet. Stick with `temp: 0` or `top_k: 1` for now. ## Configuration The following YAML configuration was used to produce this model: ### Llama-3-Yollow-8B ```yaml models: # Pivot model - model: meta-llama/Meta-Llama-3-8B # Target models - model: rinna/llama-3-youko-8b - model: tokyotech-llm/Llama-3-Swallow-8B-v0.1 merge_method: sce base_model: meta-llama/Meta-Llama-3-8B parameters: select_topk: 1.0 dtype: float32 ``` ### Llama-3-Minus-Base-8B ```yaml models: # Finetune model - model: meta-llama/Meta-Llama-3-8B-Instruct parameters: weight: 1.0 # Base model - model: meta-llama/Meta-Llama-3-8B parameters: weight: -1.0 merge_method: task_arithmetic base_model: meta-llama/Meta-Llama-3-8B-Instruct parameters: normalize: false dtype: float32 ``` ### Llama-3-Youko-Minus-Base-8B ```yaml models: # Finetune model - model: rinna/llama-3-youko-8b-instruct parameters: weight: 1.0 # Base model - model: meta-llama/Meta-Llama-3-8B parameters: weight: -1.0 merge_method: task_arithmetic base_model: rinna/llama-3-youko-8b-instruct parameters: normalize: false dtype: float32 ``` ### Llama-3-Swallow-Minus-Base-8B ```yaml models: # Finetune model - model: tokyotech-llm/Llama-3-Swallow-8B-Instruct-v0.1 parameters: weight: 1.0 # Base model - model: meta-llama/Meta-Llama-3-8B parameters: weight: -1.0 merge_method: task_arithmetic base_model: tokyotech-llm/Llama-3-Swallow-8B-Instruct-v0.1 parameters: normalize: false dtype: float32 ``` ### Llama-3-Shisa-Minus-Base-8B ```yaml models: # Finetune model - model: shisa-ai/shisa-v1-llama3-8b parameters: weight: 1.0 # Base model - model: meta-llama/Meta-Llama-3-8B parameters: weight: -1.0 merge_method: task_arithmetic base_model: shisa-ai/shisa-v1-llama3-8b parameters: normalize: false dtype: float32 ``` ### Llama-3-VNTL-Yollisa-8B ```yaml models: # Base - model: Casual-Autopsy/Llama-3-Yollow-8B+lmg-anon/vntl-llama3-8b-v2-qlora parameters: weight: 1.0 # Models - model: Casual-Autopsy/Llama-3-Minus-Base-8B parameters: density: 0.35 weight: 10e-5 - model: Casual-Autopsy/Llama-3-Shisa-Minus-Base-8B parameters: density: 0.85 weight: 25e-5 - model: Casual-Autopsy/Llama-3-Swallow-Minus-Base-8B parameters: density: 0.85 weight: 25e-5 - model: Casual-Autopsy/Llama-3-Youko-Minus-Base-8B parameters: density: 0.85 weight: 25e-5 merge_method: ties base_model: Casual-Autopsy/Llama-3-Yollow-8B+lmg-anon/vntl-llama3-8b-v2-qlora parameters: normalize: false int8_mask: false dtype: float32 ```