typo fix
Browse files
README.md
CHANGED
@@ -271,7 +271,7 @@ Granite-3.0-8B-Base is based on a decoder-only dense transformer architecture. C
|
|
271 |
| Initialization std | 0.1 | **0.1** | 0.1 | 0.1 |
|
272 |
| Sequence Length | 4096 | **4096** | 4096 | 4096 |
|
273 |
| Position Embedding | RoPE | **RoPE** | RoPE | RoPE |
|
274 |
-
| #
|
275 |
| # Active Parameters | 2.5B | **8.1B** | 400M | 800M |
|
276 |
| # Training tokens | 12T | **12T** | 10T | 10T |
|
277 |
|
|
|
271 |
| Initialization std | 0.1 | **0.1** | 0.1 | 0.1 |
|
272 |
| Sequence Length | 4096 | **4096** | 4096 | 4096 |
|
273 |
| Position Embedding | RoPE | **RoPE** | RoPE | RoPE |
|
274 |
+
| # Parameters | 2.5B | **8.1B** | 1.3B | 3.3B |
|
275 |
| # Active Parameters | 2.5B | **8.1B** | 400M | 800M |
|
276 |
| # Training tokens | 12T | **12T** | 10T | 10T |
|
277 |
|