OpenNLPLab
commited on
Commit
·
1d2dc7d
1
Parent(s):
64300ed
Update 15B readme
Browse files
README.md
CHANGED
@@ -1,3 +1,76 @@
|
|
1 |
---
|
2 |
license: apache-2.0
|
|
|
|
|
|
|
|
|
|
|
|
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: apache-2.0
|
3 |
+
language:
|
4 |
+
- en
|
5 |
+
- zh
|
6 |
+
pipeline_tag: text-generation
|
7 |
+
tags:
|
8 |
+
- ' TransNormerLLM'
|
9 |
---
|
10 |
+
|
11 |
+
<div align="center">
|
12 |
+
<h1>
|
13 |
+
TransNormerLLM3 -- A Faster and Better LLM
|
14 |
+
</h1>
|
15 |
+
</div>
|
16 |
+
|
17 |
+
# Introduction
|
18 |
+
|
19 |
+
This official repository unveils the TransNormerLLM3 model along with its open-source weights for every 50 billion tokens processed during pre-training.
|
20 |
+
|
21 |
+
[TransNormerLLM](https://arxiv.org/abs/2307.14995) evolving from [TransNormer](https://arxiv.org/abs/2210.10340), standing out as the first LLM within the linear transformer architecture. Additionally, it distinguishes itself by being the first non-Transformer LLM to exceed both traditional Transformer and other efficient Transformer models (such as, RetNet and Mamba) in terms of speed and performance.
|
22 |
+
|
23 |
+
|
24 |
+
# TransNormerLLM3
|
25 |
+
- **TransNormerLLM3-15B** features **14.83 billion** parameters. It is structured with **42 layers**, includes **40 attention heads**, and has a total **embedding size of 5120**.
|
26 |
+
- Titoken tokenizer is used with a total **vocabulary size** of about **100,000**.
|
27 |
+
- It incorporates **Simple GLU** for its channel mixer, **GLA** in the token mixer, and **SRMSNorm** for normalization.
|
28 |
+
- In terms of position encoding, the first layer employs **LRPE with exponential decay**, whereas the subsequent layers continue with **exponential decay encoding**.
|
29 |
+
|
30 |
+
### Pre-training Logbook
|
31 |
+
* Realtime Track: https://api.wandb.ai/links/opennlplab/kip314lq
|
32 |
+
* Join to dicussion: [discord](https://discord.gg/MYQh6BWN) <<<>>> [wechat group](https://github.com/OpenNLPLab/TransnormerLLM/blob/main/images/contact_me_qr.png)
|
33 |
+
> startup: [WeChat - 预训练启航](https://mp.weixin.qq.com/s/YjUY-uy89WkF75_-rBTuKw) <<<>>> [Twitter - Pre-training Commences ](https://twitter.com/opennlplab/status/1739568669502611825) <<<>>> [YouTube Recording](https://t.co/wk7svS4o5r) <<<>>> [bilibili 回放](https://www.bilibili.com/video/BV11j411J7Dy)
|
34 |
+
> first week review: [WeChat - 第一周概览](https://mp.weixin.qq.com/s/zwGnZZI3itNPoxzzXkuU2w) <<<>>> [Twitter - First Week Review](https://twitter.com/opennlplab/status/1742187694078501038)
|
35 |
+
|
36 |
+
|
37 |
+
# Released Weights
|
38 |
+
|
39 |
+
| param | token | Hugging Face | Model Scope | Wisemodel |
|
40 |
+
| :-----: | :---: | :----------: | :---------: | :-------: |
|
41 |
+
| **15B** | 50B | 🤗 | 🤖 | 🐯 |
|
42 |
+
|
43 |
+
# Benchmark Results
|
44 |
+
The evaluations of all models are conducted using the official settings and the [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) framework.
|
45 |
+
|
46 |
+
| Model | P | T | BoolQ | PIQA | HS | WG | ARC-e | ARC-c | OBQA | MMLU | C-Eval |
|
47 |
+
| ----------------------- | --- | ---- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ------ |
|
48 |
+
| **TransNormerLLM3-15B** | 15 | 0.05 | 62.08 | 72.52 | 55.55 | 57.14 | 62.12 | 31.14 | 32.40 | 27.50 | 26.18 |
|
49 |
+
| **TransNormerLLM3-15B** | 15 | 0.10 | | | | | | | | | |
|
50 |
+
|
51 |
+
> **P**: parameter size (billion). **T**: tokens (trillion). **BoolQ**: acc. **PIQA**: acc. **HellaSwag**: acc_norm. **WinoGrande**: acc. **ARC-easy**: acc. **ARC-challenge**: acc_norm. **OpenBookQA**: acc_norm. **MMLU**: 5-shot acc. **C-Eval**: 5-shot acc.
|
52 |
+
|
53 |
+
|
54 |
+
# Acknowledgments and Citation
|
55 |
+
|
56 |
+
## Acknowledgments
|
57 |
+
Our project is developed based on the following open source projects:
|
58 |
+
- [tiktoken](https://github.com/openai/tiktoken) for the tokenizer.
|
59 |
+
- [metaseq](https://github.com/facebookresearch/metaseq) for training.
|
60 |
+
- [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) for evaluation.
|
61 |
+
|
62 |
+
|
63 |
+
## Citation
|
64 |
+
If you wish to cite our work, please use the following reference:
|
65 |
+
```
|
66 |
+
@article{qin2023scaling,
|
67 |
+
title={Scaling transnormer to 175 billion parameters},
|
68 |
+
author={Qin, Zhen and Li, Dong and Sun, Weigao and Sun, Weixuan and Shen, Xuyang and Han, Xiaodong and Wei, Yunshen and Lv, Baohong and Yuan, Fei and Luo, Xiao and others},
|
69 |
+
journal={arXiv preprint arXiv:2307.14995},
|
70 |
+
year={2023}
|
71 |
+
}
|
72 |
+
```
|
73 |
+
|
74 |
+
<p align="center">
|
75 |
+
- OpenNLPLab @2024 -
|
76 |
+
</p>
|