leo-pekelis-gradient commited on
Commit
7484062
1 Parent(s): ea25e66

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +13 -15
README.md CHANGED
@@ -37,21 +37,19 @@ For training data, we generate long contexts by augmenting [SlimPajama](https://
37
 
38
  **Progressive Training Details:**
39
 
40
- | | 65K | 262K | 524K |
41
- |------------------------|-----------|-----------|------------|
42
- | Initialize From | Llama-3-70B-Instruct | 65K | 262K |
43
- | Sequence Length 2^N | 16 | 18 | 19 |
44
- | RoPE theta | 15296098 | 207112184 | 1062356830 |
45
- | Batch Size | 1 | 1 | 1 |
46
- | Gradient Accumulation Steps | 1 | 1 | 2 |
47
- | Steps | 20 | 25 | 25 |
48
- | Total Tokens | 83886080 | 104857600 | 209715200 |
49
- | Learning rate | 2.00E-05 | 2.00E-05 | 2.00E-05 |
50
- | # GPUs | 512 | 512 | 512 |
51
- | Ring parallelism | 64 | 16 | 8 |
52
- | GPU Type | NVIDIA L40S | NVIDIA L40S | NVIDIA L40S |
53
- | Minutes to Train (Wall)| 100 | 170 | 284 |
54
-
55
 
56
  **Evaluation Details:**
57
 
 
37
 
38
  **Progressive Training Details:**
39
 
40
+ | Initialize From | 65K | 262K |
41
+ |-------------------------|----------------------|------------|
42
+ | Sequence Length 2^N | 16 | 18 |
43
+ | RoPE theta | 15,296,098 | 207,112,184|
44
+ | Batch Size | 1 | 1 |
45
+ | Gradient Accumulation Steps | 1 | 1 |
46
+ | Steps | 20 | 25 |
47
+ | Total Tokens | 83,886,080 | 104,857,600|
48
+ | Learning rate | 0.00002 | 0.00002 |
49
+ | # GPUs | 512 | 512 |
50
+ | Ring parallelism | 64 | 16 |
51
+ | GPU Type | NVIDIA L40S | NVIDIA L40S|
52
+ | Minutes to Train (Wall) | 100 | 170 |
 
 
53
 
54
  **Evaluation Details:**
55