DeepSeekDDM
commited on
Update README_WEIGHTS.md
Browse files- README_WEIGHTS.md +3 -3
README_WEIGHTS.md
CHANGED
@@ -18,7 +18,7 @@ The DeepSeek-V3 weight file consists of two main components: **Main Model Weight
|
|
18 |
- Input/output embedding layers and a complete set of 61 Transformer hidden layers.
|
19 |
- **Parameter Count**:
|
20 |
- Total parameters: **671B**
|
21 |
-
- Activation parameters: **36.6B
|
22 |
|
23 |
#### Structural Details
|
24 |
|
@@ -35,8 +35,8 @@ The DeepSeek-V3 weight file consists of two main components: **Main Model Weight
|
|
35 |
- **Composition**:
|
36 |
- Additional MTP Modules defined by the `num_nextn_predict_layers` field. In this model, the value is set to 1.
|
37 |
- **Parameter Count**:
|
38 |
-
- Parameters: **11.5B unique parameters
|
39 |
-
- Activation parameters: **
|
40 |
|
41 |
#### Structural Details
|
42 |
|
|
|
18 |
- Input/output embedding layers and a complete set of 61 Transformer hidden layers.
|
19 |
- **Parameter Count**:
|
20 |
- Total parameters: **671B**
|
21 |
+
- Activation parameters: **36.6B** (including 0.9B for the output Head).
|
22 |
|
23 |
#### Structural Details
|
24 |
|
|
|
35 |
- **Composition**:
|
36 |
- Additional MTP Modules defined by the `num_nextn_predict_layers` field. In this model, the value is set to 1.
|
37 |
- **Parameter Count**:
|
38 |
+
- Parameters: **11.5B unique parameters** (excluding the shared 0.9B Embedding and 0.9B output Head).
|
39 |
+
- Activation parameters: **1.5B** (including 0.9B for the output Head).
|
40 |
|
41 |
#### Structural Details
|
42 |
|