Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,86 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: apache-2.0
|
3 |
+
datasets:
|
4 |
+
- LargeWorldModel/ultrachat_qa_mix_512K
|
5 |
+
language:
|
6 |
+
- en
|
7 |
+
- zh
|
8 |
+
---
|
9 |
+
# Model Card for llama3-8B-360Zhinao-360k-Instruct
|
10 |
+
|
11 |
+
llama3-8B-360Zhinao-360k-Instruct is 360Zhinao's extension of llama3-8B-Instruct to a 360k context window.
|
12 |
+
|
13 |
+
Within the 360k-token length,
|
14 |
+
llama3-8B-360Zhinao-360k-Instruct achieves:
|
15 |
+
|
16 |
+
- **100%** perfect recall on the "value retrieval" variant of NIAH (Needle-In-A-Haystack), which requires the model to retrieve the number in the inserted needle "The special magic {random city} number is {random 7-digit number}".
|
17 |
+
|
18 |
+
- **99.75%** near-perfect recall on the [original NIAH](https://github.com/gkamradt/LLMTest_NeedleInAHaystack) and its corresponding Chinese counterpart, where the needle (e.g. The best thing to do in San Francisco is...) and haystack (e.g. Paul Graham's essays which inevitably talk about San Francisco) are more relevant, hence a more difficult task.
|
19 |
+
Other models with 100% recall on value retrieval could struggle with this NIAH version.
|
20 |
+
|
21 |
+
## 360k-NIAH (Needle-In-A-Haystack) results
|
22 |
+
|
23 |
+
### "value retrieval" variant of NIAH
|
24 |
+
<img src="https://github.com/Qihoo360/360zhinao/blob/main/assets/llama3-8B-360Zhinao-360k-Instruct.value_score.png?raw=true" width="600" />
|
25 |
+
|
26 |
+
### Original NIAH
|
27 |
+
<img src="https://github.com/Qihoo360/360zhinao/blob/main/assets/llama3-8B-360Zhinao-360k-Instruct.en_score.png?raw=true" width="600" />
|
28 |
+
|
29 |
+
### Chinese NIAH
|
30 |
+
<img src="https://github.com/Qihoo360/360zhinao/blob/main/assets/llama3-8B-360Zhinao-360k-Instruct.zh_score.png?raw=true" width="600" />
|
31 |
+
|
32 |
+
### Remarks
|
33 |
+
|
34 |
+
We found that [the "value retrieval" variant of NIAH](https://github.com/Arize-ai/LLMTest_NeedleInAHaystack) (widely used recently in e.g. Gemini, LWM and gradient.ai) is relatively easy.
|
35 |
+
100% all-green results on value retrieval doesn't necessarily mean near-perfect results on more difficult NIAH tasks, as demonstrated by this [original-version NIAH](https://github.com/gkamradt/LLMTest_NeedleInAHaystack) result of one open-sourced llama3-8B-262k model:
|
36 |
+
<img src="https://github.com/Qihoo360/360zhinao/blob/main/assets/open-262k.en_score.png?raw=true" width="600" />
|
37 |
+
|
38 |
+
This model does achieve 100% all-green results on value retrieval but less than satisfactory results on the original version.
|
39 |
+
|
40 |
+
|
41 |
+
## Usage
|
42 |
+
|
43 |
+
llama3-8B-360Zhinao-360k-Instruct could be launched with [vllm](https://github.com/vllm-project/vllm).
|
44 |
+
To perform inference on 360k-token inputs, we used a 8 x 80G machine (A800).
|
45 |
+
|
46 |
+
```shell
|
47 |
+
model_path=${1}
|
48 |
+
|
49 |
+
export ENV_PORT=7083
|
50 |
+
export ENV_TP=8
|
51 |
+
export ENV_MODEL_PATH=$model_path
|
52 |
+
echo ${ENV_MODEL_PATH}
|
53 |
+
export ENV_MAX_MODEL_LEN=365000
|
54 |
+
export ENV_MAX_BATCH_TOKENS=365000
|
55 |
+
export ENV_GPU_MEMORY_UTIL=0.6
|
56 |
+
|
57 |
+
export PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:256
|
58 |
+
python -m vllm.entrypoints.openai.api_server \
|
59 |
+
--model "${ENV_MODEL_PATH:-/workspace/model}" \
|
60 |
+
--tensor-parallel-size "${ENV_TP:-2}" \
|
61 |
+
--trust-remote-code \
|
62 |
+
--port "${ENV_PORT:-8002}" \
|
63 |
+
--gpu-memory-utilization "${ENV_GPU_MEMORY_UTIL:-0.92}" \
|
64 |
+
--max-num-batched-tokens "${ENV_MAX_BATCH_TOKENS:-18000}" \
|
65 |
+
--max-model-len "${ENV_MAX_MODEL_LEN:-4096}" \
|
66 |
+
--max-num-seqs "${ENV_MAX_NUM_SEQS:-32}" \
|
67 |
+
--enforce-eager \
|
68 |
+
> log8.server 2>&1
|
69 |
+
```
|
70 |
+
|
71 |
+
<!-- NIAH scripts -->
|
72 |
+
|
73 |
+
|
74 |
+
## Methods
|
75 |
+
|
76 |
+
llama3-8B-360Zhinao-360k-Instruct is trained from [llama3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct).
|
77 |
+
Its original context-length is 8k with RoPE base 500,000.
|
78 |
+
|
79 |
+
We directly extended its context length to 360k. We changed RoPE base to 500,000,000 and trained on a combined SFT dataset of [LWM's open-sourced data](https://huggingface.co/LargeWorldModel) and internal long-context data in Chinese and English.
|
80 |
+
We implemented SFT on top of [EasyContext](https://github.com/jzhang38/EasyContext/) but later found that turning on pretraining loss produced much better results.
|
81 |
+
|
82 |
+
## Contact & License
|
83 |
+
Email: [email protected]
|
84 |
+
|
85 |
+
The source code of this repository follows the open-source license Apache 2.0.
|
86 |
+
This project is built on other open-source projects, including llama3, LWM and EasyContext, whose original licenses should also be followed by users.
|