Muennighoff commited on
Commit
59facdf
1 Parent(s): e726b36

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -13
README.md CHANGED
@@ -123,7 +123,9 @@ Please see [the BLOOM training README](https://github.com/bigscience-workshop/bi
123
 
124
  * ALiBI positional encodings (see [paper](https://arxiv.org/pdf/2108.12409.pdf)), with GeLU activation functions
125
 
126
- * 6.3B billion parameters:
 
 
127
 
128
  * 30 layers, 32 attention heads
129
 
@@ -167,17 +169,7 @@ Please see [the BLOOM training README](https://github.com/bigscience-workshop/bi
167
  #### **Training**
168
 
169
 
170
- _In progress._
171
-
172
- Current training logs: [Tensorboard link](https://huggingface.co/tensorboard/bigscience/tr11-176B-ml-logs/)
173
-
174
- - Checkpoint size:
175
-
176
- - Bf16 weights: 329GB
177
-
178
- - Full checkpoint with optimizer states: 2.3TB
179
-
180
- - Training throughput: About 150 TFLOP per GPU per second
181
 
182
  - Number of epochs: 1 (*current target*)
183
 
@@ -185,7 +177,7 @@ Current training logs: [Tensorboard link](https://huggingface.co/tensorboard/big
185
 
186
  - Started 11th March, 2022 11:42am PST
187
 
188
- - Estimated end: 5th July, 2022
189
 
190
  - Estimated cost of training: Equivalent of $2-5M in cloud computing (including preliminary experiments)
191
 
 
123
 
124
  * ALiBI positional encodings (see [paper](https://arxiv.org/pdf/2108.12409.pdf)), with GeLU activation functions
125
 
126
+ * 7,069,016,064 parameters:
127
+
128
+ * 1,027,604,480 embedding parameters
129
 
130
  * 30 layers, 32 attention heads
131
 
 
169
  #### **Training**
170
 
171
 
172
+ Training logs: [Tensorboard link](https://huggingface.co/tensorboard/bigscience/tr11c-2B5-logs)
 
 
 
 
 
 
 
 
 
 
173
 
174
  - Number of epochs: 1 (*current target*)
175
 
 
177
 
178
  - Started 11th March, 2022 11:42am PST
179
 
180
+ - Ended 5th July, 2022
181
 
182
  - Estimated cost of training: Equivalent of $2-5M in cloud computing (including preliminary experiments)
183