inference: false | |
language: | |
- en | |
library_name: transformers | |
pipeline_tag: text-generation | |
tags: | |
- mixtral | |
- mergekit | |
- merge | |
license: apache-2.0 | |
datasets: | |
- jondurbin/airoboros-3.2 | |
# Air-Striker-Mixtral-8x7B-Instruct-ZLoss | |
Experimental model, trained using config and [Transformers/Axolotl](https://github.com/DocShotgun/axolotl) forks provided by [Doctor-Shotgun](https://huggingface.co/Doctor-Shotgun) | |
Model was fine-tuned from [Mixtral-8x7B-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1) with airoboros-3.2 dataset, for 4 epochs, ChatML prompt format at 8K context length. | |
Additionally, model was then merged with [Mixtral-8x7B-Instruct-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1): | |
--- | |
This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit). | |
## Merge Details | |
### Merge Method | |
This model was merged using the [linear](https://arxiv.org/abs/2203.05482) merge method. | |
### Models Merged | |
The following models were included in the merge: | |
* mistralai/Mixtral-8x7B-Instruct-v0.1 | |
* LoneStriker/Air-Striker-Mixtral-8x7B-ZLoss | |
### Configuration | |
The following YAML configuration was used to produce this model: | |
```yaml | |
models: | |
- model: mistralai/Mixtral-8x7B-Instruct-v0.1 | |
parameters: | |
weight: 0.5 | |
- model: LoneStriker/Air-Striker-Mixtral-8x7B-ZLoss | |
parameters: | |
weight: 0.5 | |
merge_method: linear | |
dtype: bfloat16 | |
``` | |