gradientai
/

Llama-3-70B-Instruct-Gradient-262k

@@ -9,7 +9,7 @@ license: llama3
 ---
 <a href="https://www.gradient.ai" target="_blank"><img src="https://cdn-uploads.huggingface.co/production/uploads/655bb613e8a8971e89944f3e/TSa3V8YpoVagnTYgxiLaO.png" width="200"/></a>
-# Llama-3 70B Gradient Instruct 262K
 Gradient incorporates your data to deploy autonomous assistants that power critical operations across your business. If you're looking to build custom AI models or agents, email us a message [email protected].
@@ -40,14 +40,14 @@ For training data, we generate long contexts by augmenting [SlimPajama](https://
 | Initialize From         | 65K                  | 262K       |
 |-------------------------|----------------------|------------|
 | Sequence Length 2^N     | 16                   | 18         |
-| RoPE theta              | 15,296,098           | 207,112,184|
 | Batch Size              | 1                    | 1          |
 | Gradient Accumulation Steps | 1               | 1          |
 | Steps                   | 20                   | 25         |
 | Total Tokens            | 83,886,080           | 104,857,600|
-| Learning rate           | 0.00002              | 0.00002    |
 | # GPUs                  | 512                  | 512        |
-| Ring parallelism        | 64                   | 16         |
 | GPU Type                | NVIDIA L40S          | NVIDIA L40S|
 | Minutes to Train (Wall) | 100                  | 170        |

 ---
 <a href="https://www.gradient.ai" target="_blank"><img src="https://cdn-uploads.huggingface.co/production/uploads/655bb613e8a8971e89944f3e/TSa3V8YpoVagnTYgxiLaO.png" width="200"/></a>
+# Llama-3 70B Instruct Gradient 262K
 Gradient incorporates your data to deploy autonomous assistants that power critical operations across your business. If you're looking to build custom AI models or agents, email us a message [email protected].
 | Initialize From         | 65K                  | 262K       |
 |-------------------------|----------------------|------------|
 | Sequence Length 2^N     | 16                   | 18         |
+| RoPE Theta              | 15,296,098           | 207,112,184|
 | Batch Size              | 1                    | 1          |
 | Gradient Accumulation Steps | 1               | 1          |
 | Steps                   | 20                   | 25         |
 | Total Tokens            | 83,886,080           | 104,857,600|
+| Learning Rate           | 0.00002              | 0.00002    |
 | # GPUs                  | 512                  | 512        |
+| Ring Parallelism        | 64                   | 16         |
 | GPU Type                | NVIDIA L40S          | NVIDIA L40S|
 | Minutes to Train (Wall) | 100                  | 170        |