DeepSeekDDM commited on
Commit
0cf1748
·
verified ·
1 Parent(s): 108e1e0

Update README_WEIGHTS.md

Browse files
Files changed (1) hide show
  1. README_WEIGHTS.md +3 -3
README_WEIGHTS.md CHANGED
@@ -18,7 +18,7 @@ The DeepSeek-V3 weight file consists of two main components: **Main Model Weight
18
  - Input/output embedding layers and a complete set of 61 Transformer hidden layers.
19
  - **Parameter Count**:
20
  - Total parameters: **671B**
21
- - Activation parameters: **36.6B**.
22
 
23
  #### Structural Details
24
 
@@ -35,8 +35,8 @@ The DeepSeek-V3 weight file consists of two main components: **Main Model Weight
35
  - **Composition**:
36
  - Additional MTP Modules defined by the `num_nextn_predict_layers` field. In this model, the value is set to 1.
37
  - **Parameter Count**:
38
- - Parameters: **11.5B unique parameters**, excluding the shared 0.9B Embedding and 0.9B output Head).
39
- - Activation parameters: **2.4B** (including the shared 0.9B Embedding and 0.9B output Head).
40
 
41
  #### Structural Details
42
 
 
18
  - Input/output embedding layers and a complete set of 61 Transformer hidden layers.
19
  - **Parameter Count**:
20
  - Total parameters: **671B**
21
+ - Activation parameters: **36.6B** (including 0.9B for the output Head).
22
 
23
  #### Structural Details
24
 
 
35
  - **Composition**:
36
  - Additional MTP Modules defined by the `num_nextn_predict_layers` field. In this model, the value is set to 1.
37
  - **Parameter Count**:
38
+ - Parameters: **11.5B unique parameters** (excluding the shared 0.9B Embedding and 0.9B output Head).
39
+ - Activation parameters: **1.5B** (including 0.9B for the output Head).
40
 
41
  #### Structural Details
42