Airophin: A NTK-by-Parts RoPE Scaled QLoRA Fine-tune of Llama-2-13b (LoRA weights)
GPTQ weights can be found here: https://huggingface.co/bhenrym14/airophin-13b-pntk-16k-GPTQ fp16 weights can be found here: https://huggingface.co/bhenrym14/airophin-13b-pntk-16k-fp16
Overview
This is a finetune of Llama-2-13b, intended to extend the useful context window to 16384 tokens. There are two training phases:
- It is first trained on a long-context (7000-8192 tokens) subset of dolphin, an orca-like dataset (GPT4 split only). This amounts to roughly 110mm tokens. Airoboros-like training prompt was used instead of the dolphin system prompt. Training was done with partial NTK scaling applied (scale factor of 4). This took ~20 hours.
- The model was then finetuned on Jon Durbin's Airoboros GPT4 1.4.1, with same scaling approach, for 2 epochs. This took ~15 hours.
This is a QLoRA fine-tune (rank 64).
All training was performed with 1x RTX 6000 Ada.
For full model card, including how to use PNTK, see any of the two merged models linked above.
IMPORTANT There are TWO sets of adapter weights. adapter_model_base
is to be applied to llama-2-13b. The result can then be merged with adapter_model
. These adapters correspond to the first and second training phase respectively. Applying the second phase adapter to base llama-2-13b will produce different results to what the merged full airophin model.