Suparious commited on
Commit
35c0f7b
·
verified ·
1 Parent(s): 747d768

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +34 -1
README.md CHANGED
@@ -1,4 +1,7 @@
1
  ---
 
 
 
2
  library_name: transformers
3
  tags:
4
  - 4-bit
@@ -6,16 +9,46 @@ tags:
6
  - text-generation
7
  - autotrain_compatible
8
  - endpoints_compatible
 
 
 
 
 
 
 
 
 
 
9
  pipeline_tag: text-generation
10
- inference: false
11
  quantized_by: Suparious
 
 
 
 
 
 
 
 
12
  ---
13
  # MaziyarPanahi/Llama-3-8B-Instruct-64k AWQ
14
 
15
  - Model creator: [MaziyarPanahi](https://huggingface.co/MaziyarPanahi)
16
  - Original model: [Llama-3-8B-Instruct-64k](https://huggingface.co/MaziyarPanahi/Llama-3-8B-Instruct-64k)
17
 
 
 
 
 
 
 
 
 
 
 
 
 
18
 
 
19
 
20
  ## How to use
21
 
 
1
  ---
2
+ language:
3
+ - en
4
+ base_model: winglian/Llama-3-8b-64k-PoSE
5
  library_name: transformers
6
  tags:
7
  - 4-bit
 
9
  - text-generation
10
  - autotrain_compatible
11
  - endpoints_compatible
12
+ - axolotl
13
+ - finetune
14
+ - dpo
15
+ - facebook
16
+ - meta
17
+ - pytorch
18
+ - llama
19
+ - llama-3
20
+ - 64k
21
+ - pose
22
  pipeline_tag: text-generation
 
23
  quantized_by: Suparious
24
+ license: llama3
25
+ license_name: llama3
26
+ license_link: LICENSE
27
+ inference: false
28
+ model_creator: MaziyarPanahi
29
+ model_name: Llama-3-8B-Instruct-64k
30
+ datasets:
31
+ - Intel/orca_dpo_pairs
32
  ---
33
  # MaziyarPanahi/Llama-3-8B-Instruct-64k AWQ
34
 
35
  - Model creator: [MaziyarPanahi](https://huggingface.co/MaziyarPanahi)
36
  - Original model: [Llama-3-8B-Instruct-64k](https://huggingface.co/MaziyarPanahi/Llama-3-8B-Instruct-64k)
37
 
38
+ <img src="./llama-3-merges.webp" alt="Llama-3 DPO Logo" width="500" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
39
+
40
+ ## Model Summary
41
+
42
+ This model has been made based on a great of [@winglian](https://huggingface.co/winglian/) with his latest model [winglian/Llama-3-8b-64k-PoSE](https://huggingface.co/winglian/Llama-3-8b-64k-PoSE/)
43
+
44
+ > This model uses [PoSE](https://huggingface.co/papers/2309.10400) to extend Llama's context length from 8k to 64k @ rope_theta: 500000.0.
45
+ > We used PoSE with continued pretraining on 300M tokens from the RedPajama V1 dataset using data between 6k-8k tokens.
46
+ > We have further set rope_theta to 2M after continued pre-training to potentially further extend the context past 64k.
47
+ > This was trained on a subset of the RedPajama v1 dataset with text between 6k-8k context. We trained a rank stabilized LoRA of rank 256. [WandB](https://wandb.ai/oaaic/llama-3-64k/runs/tkcyjt37)
48
+
49
+ ### Quantized GGUF
50
 
51
+ All GGUF models come with context length of `64000`: [MaziyarPanahi/Llama-3-8B-Instruct-64k-GGUF](https://huggingface.co/MaziyarPanahi/Llama-3-8B-Instruct-64k-GGUF)
52
 
53
  ## How to use
54