tangled-alpha-0.4-core

logo

time python -B prepare_core_datasets.py
i=0, min_len=0, max_len=1048576, block_size=4097, chunk_size=16388000, len(dataset)=1567386, len(dataset) * block_size=6421580442
Total number of tokens in the optimized dataset '../core-data-0-0-1048576-4097-4000' is 6421580442
CUDA_VISIBLE_DEVICES=0 CUDA_LAUNCH_BLOCKING=0 PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True litgpt pretrain --config pretrain-core-model.yaml
Seed set to 23
Time to instantiate model: 0.23 seconds.
Total parameters: 185,631,232
Verifying settings ...
Measured TFLOPs: 7047.32
Epoch 1 | iter 256 step 1 | loss train: 11.714, val: n/a | iter time: 370.39 ms (step) remaining time: 4 days, 1:24:16
Epoch 1 | iter 512 step 2 | loss train: 11.711, val: n/a | iter time: 311.97 ms (step) remaining time: 3 days, 8:48:48
Epoch 1 | iter 768 step 3 | loss train: 11.708, val: n/a | iter time: 313.48 ms (step) remaining time: 3 days, 3:22:46
Epoch 1 | iter 1024 step 4 | loss train: 11.704, val: n/a | iter time: 313.71 ms (step) remaining time: 3 days, 0:41:32
Epoch 1 | iter 1280 step 5 | loss train: 11.694, val: n/a | iter time: 314.42 ms (step) remaining time: 2 days, 23:05:08
Epoch 1 | iter 1536 step 6 | loss train: 11.687, val: n/a | iter time: 314.62 ms (step) remaining time: 2 days, 22:00:35
Epoch 1 | iter 1792 step 7 | loss train: 11.668, val: n/a | iter time: 314.94 ms (step) remaining time: 2 days, 21:14:06
Epoch 1 | iter 2048 step 8 | loss train: 11.645, val: n/a | iter time: 316.28 ms (step) remaining time: 2 days, 20:39:12
Epoch 1 | iter 2304 step 9 | loss train: 11.630, val: n/a | iter time: 315.29 ms (step) remaining time: 2 days, 20:11:52
Epoch 1 | iter 2560 step 10 | loss train: 11.609, val: n/a | iter time: 315.53 ms (step) remaining time: 2 days, 19:49:36
Epoch 1 | iter 2816 step 11 | loss train: 11.564, val: n/a | iter time: 314.95 ms (step) remaining time: 2 days, 19:31:09
Epoch 1 | iter 3072 step 12 | loss train: 11.510, val: n/a | iter time: 314.23 ms (step) remaining time: 2 days, 19:15:24
Epoch 1 | iter 3328 step 13 | loss train: 11.453, val: n/a | iter time: 315.71 ms (step) remaining time: 2 days, 19:02:02
Epoch 1 | iter 3584 step 14 | loss train: 11.411, val: n/a | iter time: 316.43 ms (step) remaining time: 2 days, 18:50:24
Epoch 1 | iter 3840 step 15 | loss train: 11.346, val: n/a | iter time: 314.83 ms (step) remaining time: 2 days, 18:40:08
Epoch 1 | iter 4096 step 16 | loss train: 11.300, val: n/a | iter time: 314.94 ms (step) remaining time: 2 days, 18:30:57
Epoch 1 | iter 4352 step 17 | loss train: 11.237, val: n/a | iter time: 314.13 ms (step) remaining time: 2 days, 18:22:39
Epoch 1 | iter 4608 step 18 | loss train: 11.193, val: n/a | iter time: 314.85 ms (step) remaining time: 2 days, 18:15:08
Epoch 1 | iter 4864 step 19 | loss train: 11.131, val: n/a | iter time: 315.23 ms (step) remaining time: 2 days, 18:08:16
Epoch 1 | iter 5120 step 20 | loss train: 11.084, val: n/a | iter time: 314.08 ms (step) remaining time: 2 days, 18:03:14
# ...
Epoch 1 | iter 780800 step 3050 | loss train: 3.176, val: 3.554 | iter time: 314.97 ms (step) remaining time: 0:15:21
Epoch 1 | iter 781056 step 3051 | loss train: 3.207, val: 3.554 | iter time: 315.53 ms (step) remaining time: 0:14:05
Epoch 1 | iter 781312 step 3052 | loss train: 3.186, val: 3.554 | iter time: 315.74 ms (step) remaining time: 0:12:48
Epoch 1 | iter 781568 step 3053 | loss train: 3.189, val: 3.554 | iter time: 315.17 ms (step) remaining time: 0:11:32
Epoch 1 | iter 781824 step 3054 | loss train: 3.305, val: 3.554 | iter time: 315.29 ms (step) remaining time: 0:10:15
Epoch 1 | iter 782080 step 3055 | loss train: 3.173, val: 3.554 | iter time: 315.11 ms (step) remaining time: 0:08:59
Epoch 1 | iter 782336 step 3056 | loss train: 3.223, val: 3.554 | iter time: 315.35 ms (step) remaining time: 0:07:42
Epoch 1 | iter 782592 step 3057 | loss train: 3.182, val: 3.554 | iter time: 315.18 ms (step) remaining time: 0:06:26
Epoch 1 | iter 782848 step 3058 | loss train: 3.196, val: 3.554 | iter time: 316.37 ms (step) remaining time: 0:05:09
Epoch 1 | iter 783104 step 3059 | loss train: 3.187, val: 3.554 | iter time: 315.86 ms (step) remaining time: 0:03:53
Epoch 1 | iter 783360 step 3060 | loss train: 3.163, val: 3.554 | iter time: 314.81 ms (step) remaining time: 0:02:36
Epoch 1 | iter 783616 step 3061 | loss train: 3.190, val: 3.554 | iter time: 315.23 ms (step) remaining time: 0:01:20
Epoch 2 | iter 783872 step 3062 | loss train: 3.239, val: 3.554 | iter time: 317.71 ms (step) remaining time: 0:00:03
Validating ...
Final evaluation | val loss: 3.552 | val ppl: 34.896
Saving checkpoint to '../out/pretrain-core/final/lit_model.pth'
----------------------------------------
| Performance
| - Total tokens  : 6,421,577,728
| - Training Time : 234340.96 s
| - Tok/sec       : 17286.07 tok/s
| ----------------------------------------
| Memory Usage
| - Memory Used   : 17.30 GB
----------------------------------------

Backup wandb:

mv wandb wandb-pretrain-core

Chat with model:

CUDA_VISIBLE_DEVICES=0 CUDA_LAUNCH_BLOCKING=0 PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True litgpt chat ../out/pretrain-core/final
CUDA_VISIBLE_DEVICES=0 CUDA_LAUNCH_BLOCKING=0 PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True time litgpt evaluate --tasks 'leaderboard' --out_dir '../evaluate/pretrain-core-0/leaderboard/' --batch_size 1 --dtype 'bfloat16' '../out/pretrain-core/final'
# ...
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Datasets used to train tangledgroup/tangled-alpha-0.4-core