jacobfulano commited on
Commit
c3d9ccf
·
1 Parent(s): 3a454d2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -5
README.md CHANGED
@@ -11,16 +11,18 @@ inference: false
11
 
12
  MosaicBERT-Base is a new BERT architecture and training recipe optimized for fast pretraining.
13
  MosaicBERT trains faster and achieves higher pretraining and finetuning accuracy when benchmarked against
14
- Hugging Face's [bert-base-uncased](https://huggingface.co/bert-base-uncased).
 
15
 
16
  __This model was trained with [ALiBi](https://arxiv.org/abs/2108.12409) on a sequence length of 2048 tokens.__
17
 
18
  ALiBi allows a model trained with a sequence length n to easily extrapolate to sequence lengths >2n during finetuning. For more details, see [Train Short, Test Long: Attention with Linear
19
  Biases Enables Input Length Extrapolation (Press et al. 2022)](https://arxiv.org/abs/2108.12409)
20
 
21
- It is part of the family of MosaicBERT-Base models:
22
 
23
  * [mosaic-bert-base](https://huggingface.co/mosaicml/mosaic-bert-base) (trained on a sequence length of 128 tokens)
 
24
  * [mosaic-bert-base-seqlen-512](https://huggingface.co/mosaicml/mosaic-bert-base-seqlen-512)
25
  * [mosaic-bert-base-seqlen-1024](https://huggingface.co/mosaicml/mosaic-bert-base-seqlen-1024)
26
  * mosaic-bert-base-seqlen-2048
@@ -40,7 +42,7 @@ April 2023
40
 
41
  ```python
42
  from transformers import AutoModelForMaskedLM
43
- mlm = AutoModelForMaskedLM.from_pretrained('mosaicml/mosaic-bert-base', trust_remote_code=True)
44
  ```
45
 
46
  The tokenizer for this model is simply the Hugging Face `bert-base-uncased` tokenizer.
@@ -56,7 +58,7 @@ To use this model directly for masked language modeling, use `pipeline`:
56
  from transformers import AutoModelForMaskedLM, BertTokenizer, pipeline
57
 
58
  tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
59
- mlm = AutoModelForMaskedLM.from_pretrained('mosaicml/mosaic-bert-base', trust_remote_code=True)
60
 
61
  classifier = pipeline('fill-mask', model=mlm, tokenizer=tokenizer)
62
 
@@ -73,7 +75,7 @@ This model requires that `trust_remote_code=True` be passed to the `from_pretrai
73
 
74
  ```python
75
  mlm = AutoModelForMaskedLM.from_pretrained(
76
- 'mosaicml/mosaic-bert-base',
77
  trust_remote_code=True,
78
  revision='24512df',
79
  )
 
11
 
12
  MosaicBERT-Base is a new BERT architecture and training recipe optimized for fast pretraining.
13
  MosaicBERT trains faster and achieves higher pretraining and finetuning accuracy when benchmarked against
14
+ Hugging Face's [bert-base-uncased](https://huggingface.co/bert-base-uncased). It incorporates efficiency insights
15
+ from the past half a decade of transformers research, from RoBERTa to T5 and GPT.
16
 
17
  __This model was trained with [ALiBi](https://arxiv.org/abs/2108.12409) on a sequence length of 2048 tokens.__
18
 
19
  ALiBi allows a model trained with a sequence length n to easily extrapolate to sequence lengths >2n during finetuning. For more details, see [Train Short, Test Long: Attention with Linear
20
  Biases Enables Input Length Extrapolation (Press et al. 2022)](https://arxiv.org/abs/2108.12409)
21
 
22
+ It is part of the **family of MosaicBERT-Base models** trained using ALiBi on different sequence lengths:
23
 
24
  * [mosaic-bert-base](https://huggingface.co/mosaicml/mosaic-bert-base) (trained on a sequence length of 128 tokens)
25
+ * [mosaic-bert-base-seqlen-256](https://huggingface.co/mosaicml/mosaic-bert-base-seqlen-256)
26
  * [mosaic-bert-base-seqlen-512](https://huggingface.co/mosaicml/mosaic-bert-base-seqlen-512)
27
  * [mosaic-bert-base-seqlen-1024](https://huggingface.co/mosaicml/mosaic-bert-base-seqlen-1024)
28
  * mosaic-bert-base-seqlen-2048
 
42
 
43
  ```python
44
  from transformers import AutoModelForMaskedLM
45
+ mlm = AutoModelForMaskedLM.from_pretrained('mosaicml/mosaic-bert-base-seqlen-2048', trust_remote_code=True)
46
  ```
47
 
48
  The tokenizer for this model is simply the Hugging Face `bert-base-uncased` tokenizer.
 
58
  from transformers import AutoModelForMaskedLM, BertTokenizer, pipeline
59
 
60
  tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
61
+ mlm = AutoModelForMaskedLM.from_pretrained('mosaicml/mosaic-bert-base-seqlen-2048', trust_remote_code=True)
62
 
63
  classifier = pipeline('fill-mask', model=mlm, tokenizer=tokenizer)
64
 
 
75
 
76
  ```python
77
  mlm = AutoModelForMaskedLM.from_pretrained(
78
+ 'mosaicml/mosaic-bert-base-seqlen-2048',
79
  trust_remote_code=True,
80
  revision='24512df',
81
  )