Changing the value of kv_count from 34 to 40 indicates an increase in the number of key-value pairs in the model. These key-value pairs are mainly used to represent attention information within neural networks, particularly in Transformer-type models such as LLaMA.
merge
This is a merge of pre-trained language models created using mergekit.
Merge Details
Merge Method
This model was merged using the passthrough merge method using Sao10K/Fimbulvetr-11B-v2 as a base.
Models Merged
The following models were included in the merge:
Configuration
The following YAML configuration was used to produce this model:
base_model: Sao10K/Fimbulvetr-11B-v2
merge_method: passthrough
dtype: float16
parameters:
normalize: true
slices:
- sources:
- model: Sao10K/Fimbulvetr-11B-v2
layer_range: [0, 48] # Assumi che il modello abbia 48 layer
densify:
- linear
- "rope:alpha=8192/4096" # Estende il contesto a 8192
tokens:
- source: Sao10K/Fimbulvetr-11B-v2
mode: stretch
- Downloads last month
- 16
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.