Update README.md
Browse files
README.md
CHANGED
@@ -156,7 +156,7 @@ We evaluate our model on LiveCodeBench (0901-0401), a benchmark designed for liv
|
|
156 |
|
157 |
## 4. Model Architecture
|
158 |
DeepSeek-V2 adopts innovative architectures to guarantee economical training and efficient inference:
|
159 |
-
- For attention, we design
|
160 |
- For Feed-Forward Networks (FFNs), we adopt DeepSeekMoE architecture, a high-performance MoE architecture that enables training stronger models at lower costs.
|
161 |
|
162 |
<p align="center">
|
|
|
156 |
|
157 |
## 4. Model Architecture
|
158 |
DeepSeek-V2 adopts innovative architectures to guarantee economical training and efficient inference:
|
159 |
+
- For attention, we design MLA (Multi-head Latent Attention), which utilizes low-rank key-value union compression to eliminate the bottleneck of inference-time key-value cache, thus supporting efficient inference.
|
160 |
- For Feed-Forward Networks (FFNs), we adopt DeepSeekMoE architecture, a high-performance MoE architecture that enables training stronger models at lower costs.
|
161 |
|
162 |
<p align="center">
|