qicao-apple
commited on
Commit
·
602b24f
1
Parent(s):
7d53f89
update OpenELM
Browse files
README.md
CHANGED
@@ -4,11 +4,11 @@ license_name: apple-sample-code-license
|
|
4 |
license_link: LICENSE
|
5 |
---
|
6 |
|
7 |
-
# OpenELM: An Efficient Language Model Family with Open
|
8 |
|
9 |
*Sachin Mehta, Mohammad Hossein Sekhavat, Qingqing Cao, Maxwell Horton, Yanzi Jin, Chenfan Sun, Iman Mirzadeh, Mahyar Najibi, Dmitry Belenko, Peter Zatloukal, Mohammad Rastegari*
|
10 |
|
11 |
-
We introduce **OpenELM**, a family of **Open
|
12 |
|
13 |
Our pre-training dataset contains RefinedWeb, deduplicated PILE, a subset of RedPajama, and a subset of Dolma v1.6, totaling approximately 1.8 trillion tokens. Please check license agreements and terms of these datasets before using them.
|
14 |
|
@@ -133,7 +133,7 @@ pip install tokenizers>=0.15.2 transformers>=4.38.2 sentencepiece>=0.2.0
|
|
133 |
```bash
|
134 |
|
135 |
# OpenELM-270M
|
136 |
-
hf_model=OpenELM-270M
|
137 |
|
138 |
# this flag is needed because lm-eval-harness set add_bos_token to False by default, but OpenELM uses LLaMA tokenizer which requires add_bos_token to be True
|
139 |
tokenizer=meta-llama/Llama-2-7b-hf
|
@@ -195,7 +195,7 @@ If you find our work useful, please cite:
|
|
195 |
|
196 |
```BibTex
|
197 |
@article{mehtaOpenELMEfficientLanguage2024,
|
198 |
-
title = {{OpenELM}: {An} {Efficient} {Language} {Model} {Family} with {Open}
|
199 |
shorttitle = {{OpenELM}},
|
200 |
url = {https://arxiv.org/abs/2404.14619v1},
|
201 |
language = {en},
|
|
|
4 |
license_link: LICENSE
|
5 |
---
|
6 |
|
7 |
+
# OpenELM: An Efficient Language Model Family with Open Training and Inference Framework
|
8 |
|
9 |
*Sachin Mehta, Mohammad Hossein Sekhavat, Qingqing Cao, Maxwell Horton, Yanzi Jin, Chenfan Sun, Iman Mirzadeh, Mahyar Najibi, Dmitry Belenko, Peter Zatloukal, Mohammad Rastegari*
|
10 |
|
11 |
+
We introduce **OpenELM**, a family of **Open** **E**fficient **L**anguage **M**odels. OpenELM uses a layer-wise scaling strategy to efficiently allocate parameters within each layer of the transformer model, leading to enhanced accuracy. We pretrained OpenELM models using the [CoreNet](https://github.com/apple/corenet) library. We release both pretrained and instruction tuned models with 270M, 450M, 1.1B and 3B parameters.
|
12 |
|
13 |
Our pre-training dataset contains RefinedWeb, deduplicated PILE, a subset of RedPajama, and a subset of Dolma v1.6, totaling approximately 1.8 trillion tokens. Please check license agreements and terms of these datasets before using them.
|
14 |
|
|
|
133 |
```bash
|
134 |
|
135 |
# OpenELM-270M
|
136 |
+
hf_model=apple/OpenELM-270M
|
137 |
|
138 |
# this flag is needed because lm-eval-harness set add_bos_token to False by default, but OpenELM uses LLaMA tokenizer which requires add_bos_token to be True
|
139 |
tokenizer=meta-llama/Llama-2-7b-hf
|
|
|
195 |
|
196 |
```BibTex
|
197 |
@article{mehtaOpenELMEfficientLanguage2024,
|
198 |
+
title = {{OpenELM}: {An} {Efficient} {Language} {Model} {Family} with {Open} {Training} and {Inference} {Framework}},
|
199 |
shorttitle = {{OpenELM}},
|
200 |
url = {https://arxiv.org/abs/2404.14619v1},
|
201 |
language = {en},
|