|
--- |
|
license: mit |
|
--- |
|
``` |
|
=================================================================================================================== |
|
Layer (type:depth-idx) Output Shape Param # |
|
=================================================================================================================== |
|
MegaForMaskedLM [4, 2048, 50265] -- |
|
├─MegaModel: 1-1 [4, 2048, 768] -- |
|
│ └─MegaEmbeddings: 2-1 [4, 2048, 768] -- |
|
│ │ └─Embedding: 3-1 [4, 2048, 768] 38,603,520 |
|
│ └─ModuleList: 2-2 -- -- |
|
│ │ └─MegaBlock: 3-2 [2048, 4, 768] 6,202,626 |
|
│ │ └─MegaBlock: 3-3 [2048, 4, 768] 6,202,626 |
|
│ │ └─MegaBlock: 3-4 [2048, 4, 768] 6,202,626 |
|
│ │ └─MegaBlock: 3-5 [2048, 4, 768] 6,202,626 |
|
│ │ └─MegaBlock: 3-6 [2048, 4, 768] 6,202,626 |
|
│ │ └─MegaBlock: 3-7 [2048, 4, 768] 6,202,626 |
|
│ │ └─MegaBlock: 3-8 [2048, 4, 768] 6,202,626 |
|
│ │ └─MegaBlock: 3-9 [2048, 4, 768] 6,202,626 |
|
│ │ └─MegaBlock: 3-10 [2048, 4, 768] 6,202,626 |
|
│ │ └─MegaBlock: 3-11 [2048, 4, 768] 6,202,626 |
|
│ │ └─MegaBlock: 3-12 [2048, 4, 768] 6,202,626 |
|
│ │ └─MegaBlock: 3-13 [2048, 4, 768] 6,202,626 |
|
├─Linear: 1-2 [4, 2048, 50265] 38,653,785 |
|
=================================================================================================================== |
|
Total params: 151,688,817 |
|
Trainable params: 151,688,817 |
|
Non-trainable params: 0 |
|
Total mult-adds (G): 150.35 |
|
=================================================================================================================== |
|
Input size (MB): 0.07 |
|
Forward/backward pass size (MB): 10818.75 |
|
Params size (MB): 606.71 |
|
Estimated Total Size (MB): 11425.52 |
|
=================================================================================================================== |
|
``` |