Update README.md
Browse files
README.md
CHANGED
@@ -8,7 +8,7 @@ language:
|
|
8 |
---
|
9 |
# Model Card for llama3-8B-360Zhinao-360k-Instruct
|
10 |
|
11 |
-
llama3-8B-360Zhinao-360k-Instruct is 360Zhinao's extension of llama3-8B-Instruct to a 360k context window.
|
12 |
|
13 |
Within the 360k-token length,
|
14 |
llama3-8B-360Zhinao-360k-Instruct achieves:
|
@@ -78,8 +78,6 @@ python -m vllm.entrypoints.openai.api_server \
|
|
78 |
> log8.server 2>&1
|
79 |
```
|
80 |
|
81 |
-
<!-- NIAH scripts -->
|
82 |
-
|
83 |
|
84 |
## Methods
|
85 |
|
@@ -87,7 +85,11 @@ llama3-8B-360Zhinao-360k-Instruct is trained from [llama3-8B-Instruct](https://h
|
|
87 |
Its original context-length is 8k with RoPE base 500,000.
|
88 |
|
89 |
We directly extended its context length to 360k. We changed RoPE base to 500,000,000 and trained on a combined SFT dataset of [LWM's open-sourced data](https://huggingface.co/LargeWorldModel) and internal long-context data in Chinese and English.
|
90 |
-
We implemented SFT on top of [EasyContext](https://github.com/jzhang38/EasyContext/) but later found that turning on pretraining loss produced much better results.
|
|
|
|
|
|
|
|
|
91 |
|
92 |
## Contact & License
|
93 |
Email: [email protected]
|
|
|
8 |
---
|
9 |
# Model Card for llama3-8B-360Zhinao-360k-Instruct
|
10 |
|
11 |
+
llama3-8B-360Zhinao-360k-Instruct is 360Zhinao's extension of llama3-8B-Instruct to a 360k context window [[GitHub]](https://github.com/Qihoo360/360zhinao/tree/main/360k).
|
12 |
|
13 |
Within the 360k-token length,
|
14 |
llama3-8B-360Zhinao-360k-Instruct achieves:
|
|
|
78 |
> log8.server 2>&1
|
79 |
```
|
80 |
|
|
|
|
|
81 |
|
82 |
## Methods
|
83 |
|
|
|
85 |
Its original context-length is 8k with RoPE base 500,000.
|
86 |
|
87 |
We directly extended its context length to 360k. We changed RoPE base to 500,000,000 and trained on a combined SFT dataset of [LWM's open-sourced data](https://huggingface.co/LargeWorldModel) and internal long-context data in Chinese and English.
|
88 |
+
We implemented SFT on top of [EasyContext](https://github.com/jzhang38/EasyContext/) ([code](https://github.com/Qihoo360/360zhinao/blob/main/360k/train.sft.EasyContext.py) with simple derivation on loss reduction), but later found that turning on pretraining loss produced much better results.
|
89 |
+
SFT is likely suitable for further finetuning within the already extended context window.
|
90 |
+
|
91 |
+
We have been using [Megatron-LM](https://github.com/NVIDIA/Megatron-LM) for several months with tailored optimization on GPU memory. Its context parallelism wasn’t quite ready back then and we have now switched to ring attention implementations such as EasyContext.
|
92 |
+
|
93 |
|
94 |
## Contact & License
|
95 |
Email: [email protected]
|