solidrust
/

Llama-3-8B-Instruct-64k-AWQ

Model card Files Files and versions Community

Suparious commited on May 2, 2024

Commit

35c0f7b

·

verified ·

1 Parent(s): 747d768

Update README.md

Files changed (1) hide show

README.md +34 -1

README.md CHANGED Viewed

@@ -1,4 +1,7 @@
 ---
 library_name: transformers
 tags:
 - 4-bit
@@ -6,16 +9,46 @@ tags:
 - text-generation
 - autotrain_compatible
 - endpoints_compatible
 pipeline_tag: text-generation
-inference: false
 quantized_by: Suparious
 ---
 # MaziyarPanahi/Llama-3-8B-Instruct-64k AWQ
 - Model creator: [MaziyarPanahi](https://huggingface.co/MaziyarPanahi)
 - Original model: [Llama-3-8B-Instruct-64k](https://huggingface.co/MaziyarPanahi/Llama-3-8B-Instruct-64k)
 ## How to use

 ---
+language:
+- en
+base_model: winglian/Llama-3-8b-64k-PoSE
 library_name: transformers
 tags:
 - 4-bit
 - text-generation
 - autotrain_compatible
 - endpoints_compatible
+- axolotl
+- finetune
+- dpo
+- facebook
+- meta
+- pytorch
+- llama
+- llama-3
+- 64k
+- pose
 pipeline_tag: text-generation
 quantized_by: Suparious
+license: llama3
+license_name: llama3
+license_link: LICENSE
+inference: false
+model_creator: MaziyarPanahi
+model_name: Llama-3-8B-Instruct-64k
+datasets:
+- Intel/orca_dpo_pairs
 ---
 # MaziyarPanahi/Llama-3-8B-Instruct-64k AWQ
 - Model creator: [MaziyarPanahi](https://huggingface.co/MaziyarPanahi)
 - Original model: [Llama-3-8B-Instruct-64k](https://huggingface.co/MaziyarPanahi/Llama-3-8B-Instruct-64k)
+<img src="./llama-3-merges.webp" alt="Llama-3 DPO Logo" width="500" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
+## Model Summary
+This model has been made based on a great of [@winglian](https://huggingface.co/winglian/) with his latest model [winglian/Llama-3-8b-64k-PoSE](https://huggingface.co/winglian/Llama-3-8b-64k-PoSE/)
+> This model uses [PoSE](https://huggingface.co/papers/2309.10400) to extend Llama's context length from 8k to 64k @ rope_theta: 500000.0.
+> We used PoSE with continued pretraining on 300M tokens from the RedPajama V1 dataset using data between 6k-8k tokens.
+> We have further set rope_theta to 2M after continued pre-training to potentially further extend the context past 64k.
+> This was trained on a subset of the RedPajama v1 dataset with text between 6k-8k context. We trained a rank stabilized LoRA of rank 256. [WandB](https://wandb.ai/oaaic/llama-3-64k/runs/tkcyjt37)
+### Quantized GGUF
+All GGUF models come with context length of `64000`: [MaziyarPanahi/Llama-3-8B-Instruct-64k-GGUF](https://huggingface.co/MaziyarPanahi/Llama-3-8B-Instruct-64k-GGUF)
 ## How to use